Last Comment Bug 545650 - Downloading (large) files to network drive on Windows 7 results in corrupted files (Bug of SMB2, SetFilePointer returns 0 for the end of file after normal writing. Mozilla relies on the returned offset from OS.)
: Downloading (large) files to network drive on Windows 7 results in corrupted ...
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: XPCOM (show other bugs)
: Trunk
: x86_64 Windows 7
: -- critical with 17 votes (vote)
: mozilla1.9.2
Assigned To: Steve Harper
:
Mentors:
: 512543 564316 621141 (view as bug list)
Depends on: destroys_encrypted 724177
Blocks:
  Show dependency treegraph
 
Reported: 2010-02-11 07:06 PST by peterdk
Modified: 2012-10-07 16:49 PDT (History)
30 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
-
-
wanted


Attachments
A patch to set the file pointer to the end of file when its opened and the PR_APPEND flag is set (1.66 KB, patch)
2010-04-30 18:53 PDT, Steve Harper
benjamin: review-
wtc: feedback-
Details | Diff | Review
Proposed patch (5.56 KB, patch)
2011-02-01 08:48 PST, Wan-Teh Chang
jmathies: review+
Details | Diff | Review
Proposed patch v2 (5.63 KB, patch)
2011-02-05 10:39 PST, Wan-Teh Chang
jmathies: review+
Details | Diff | Review
Patch to set file pointer to EOF when opening file for first time (3.20 KB, patch)
2011-02-06 09:28 PST, Steve Harper
no flags Details | Diff | Review
Process Monitor log with tryserver build, download to local drive (322.33 KB, text/plain)
2011-02-06 20:19 PST, WADA
no flags Details
Patch to fix bug in Windows 7 when copying a file to a share using SMBV2 (1.43 KB, patch)
2011-02-10 13:33 PST, Steve Harper
no flags Details | Diff | Review
A patch to ensure all copy opperations performed in nsLocalFileWin use unbuffered I/O. (3.18 KB, patch)
2011-02-13 05:02 PST, Steve Harper
no flags Details | Diff | Review
A patch to ensure all copy opperations performed in nsLocalFileWin use unbuffered I/O. (3.09 KB, patch)
2011-02-14 14:07 PST, Steve Harper
no flags Details | Diff | Review
A patch to ensure all copy opperations performed in nsLocalFileWin use unbuffered I/O. (3.57 KB, patch)
2011-02-15 13:17 PST, Steve Harper
no flags Details | Diff | Review
A patch to ensure all copy opperations performed in nsLocalFileWin use unbuffered I/O. (3.55 KB, patch)
2011-02-15 13:28 PST, Steve Harper
no flags Details | Diff | Review
A patch to ensure all copy opperations performed in nsLocalFileWin use unbuffered I/O. (3.49 KB, patch)
2011-02-15 15:03 PST, Steve Harper
jmathies: review+
Details | Diff | Review

Description peterdk 2010-02-11 07:06:27 PST
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6 (.NET CLR 3.5.30729)
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6 (.NET CLR 3.5.30729)

I have a server that I access as a network drive share in Windows 7. 
I was getting worried that my raid5 disk was somehow borked, since all my larger file downloads, that I stored directly to the network drive, got corrupt.

However,when I downloaded these files directly to my harddisk instead of using the network drive, all went well.

It looks like since Firefox 3, somehow the handling of saving downloads to a network drive is corrupted.

Reproducible: Sometimes

Steps to Reproduce:
1.Make a network share in Win7, mapped to a Drive letter
2.Download 10mb+ file in Firefox directly to the network drive
3.open the file

for example: http://download.microsoft.com/download/0/1/1/011F8F7B-18F7-4AFD-A0C9-BD5AEDEF1DBB/dotNetFx40_Full_x86_x64.exe
Actual Results:  
Most of the time the download is corrupt

Expected Results:  
The download should just save correctly to the network.

- It looks like the downloading is only corrupt if a new file is made. When I save it to an existing corrupt file it suddenly works. 


- There is quite a large forum thread here about the problem
http://support.mozilla.com/en-US/forum/1/313478
There are some non-related issues posted there, but some people are having the same issue.
Comment 1 peterdk 2010-02-11 07:14:33 PST
Some relevant excerpts from the forum thread:

=====jim========

I am using Windows 7 x86 and x64 RTM build 7600 and I get this error with Firefox 3.0.11, 3.5.1, and 3.5.2. It occurs only between Win7 OS's. When share is located on an XP machine then it is okay. Or if Firefox is located on an XP machine and the share is on a Win7 machine then it is okay.

IE downloads are fine and do not seem to be affected by this issue. It seems only Firefox has the problem. Perhaps MS planned it this way :)

Trying to find a workaround (or a solution) as I like to keep my downloads centralized in one location.

=====kroff======
same corruption problem with win7 x64 rtm, and a network share on win 2008r2.
i did a clean reinstall of win7, latest firefox, and as soon as i download directly to my network share, i cant open them. if i download on my local disk and then copy on the network share, works without problems.

thats annoying. if i download a file to the network share using ie8 or chrome, no corruption.

=====bradd======
I have the same problem here. Windows 7 Ultimate (bought off the shelf) member of a Windows 2003 R2 domain. I can download to my local computer fine. If I download directly to a mapped network drive and attempt to run the file I get, "The file is not a valid win32 executable." I can download with IE to the mapped network drive okay. I am running 3.5.5. 

=====Riley Foster=====
Windows 7 Ultimate x64
Firefox 3.6 Beta 2

Every file that I download and save directly to a mapped network drive is corrupt. If I save to a local drive first, then copy to the network it works fine.

Server is Windows 2008 R2.
Comment 3 Steve Harper 2010-04-30 18:50:50 PDT
I can reproduce this bug consistently with the link posted by peterdk. I have stepped through the code in the debugger and it looks like this error occurs due to SetFilePointer returning 0 for the end of file after correctly writing a sequence of 32k chunks to the file. 

SetFilePointer should move the file pointer to the end of file before every write so that it can append data to the file when the download is resumed. Do we have to do this before every write(Each subsequent write automatically moves the file pointer). 

I think the problem arises due to a bug in Windows 7 SMB2 where data is cached before been committed to disk(The file is the correct size on the shared drive, but after writing a few chunks to the file the Windows API IO functions see the file as empty with a length of Zero bytes)

I have attached a patch which sets the file pointer to the end of file only when the file it is first opened and the append flag is set. Could someone test that this patch works on there machine? and verify that downloads to a Windows 7 mapped drive work ok?

Hopefully this patch should work around this Windows 7 bug.

Thanks,

Steve
Comment 4 Steve Harper 2010-04-30 18:53:05 PDT
Created attachment 442882 [details] [diff] [review]
A patch to set the file pointer to the end of file when its opened and the PR_APPEND flag is set
Comment 5 Steve Allison 2010-05-01 01:20:59 PDT
I would love to test this but would need a compiled version.
Comment 6 Steve Harper 2010-05-01 07:28:09 PDT
If you like I could send you a compiled debug build. The installer is just under 10meg Do you have an email? or some place I could upload it?

Thanks,

Steve
Comment 7 Steve Allison 2010-05-01 08:26:11 PDT
I'd welcome the oppurtunity. Send to steve@gigabix.net
Comment 8 Kevin Brosnan 2010-05-01 08:47:36 PDT
The patch will need review  from one or more of the module owners in NSPR and/or XPCOM. Find the email for one of them at http://www.mozilla.org/about/owners.html and set the review flag to [?].

https://developer.mozilla.org/En/Developer_Guide/How_to_Submit_a_Patch#Getting_Reviews
Comment 9 Benjamin Smedberg [:bsmedberg] 2010-05-03 08:40:37 PDT
Comment on attachment 442882 [details] [diff] [review]
A patch to set the file pointer to the end of file when its opened and the PR_APPEND flag is set

If PR_APPEND is not behaving correctly, we should fix it in NSPR, not add this hack to XPCOM. Requesting feedback from wtc for the correct solution.
Comment 10 Steve Harper 2010-05-03 15:24:03 PDT
Hello Benjamin,

Thanks for taking the time to look at my patch.

I do not understand why the current code seeks to eof before every write. I would have thought this only needs to happen once when the file is first opened for append? Or am i missing something? Fixing the Windows 7 issue is a side affect of setting the file pointer once.
Comment 11 Matthias Versen [:Matti] 2010-05-06 16:34:27 PDT
*** Bug 512543 has been marked as a duplicate of this bug. ***
Comment 12 Matthias Versen [:Matti] 2010-05-06 16:34:33 PDT
*** Bug 564316 has been marked as a duplicate of this bug. ***
Comment 13 Toby Douglass 2010-05-06 23:02:10 PDT
This bug doesn't need 10mb+ files.  I downloaded the GNUWin32 WGET setup binary, which is 2.87mb, and that was corrupted.
Comment 14 Steve Allison 2010-05-07 00:07:23 PDT
I had the issue downloading files <1mb
Comment 15 Adam 2010-05-11 14:42:25 PDT
I'm getting the same issue with Seamonkey 2.0.4 (Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.9) Gecko/20100317 SeaMonkey/2.0.4) on both Windows 7 x86 and x64 saving to network paths on both Server 2008 R2 and Windows 7.

The issue also causes Seamonkey to become intermittently unresponsive for the duration of the download.

Saving the same file to a local drive does not result in the same issue.
Comment 16 Tyler Retelle 2010-06-28 14:11:52 PDT
From my post in the closed Bug 512543:

"The problem has something to do with when SMB v2 is enabled on Windows 7.  SMB v2 was added to
Windows Vista/Server 2008/7 and is used as the default application-layer
networking protocol between those systems.

I recently upgraded my home PC to Windows 7 and had no trouble downloading
files directly from Firefox to the existing Windows Server 2003 R2 Standard x64
server I was running, most likely because Windows Server 2003 uses SMB v1 for
communication.  I use mapped drives to access the shares on my server.  After
completing my Windows 7 upgrade I upgraded my server to Windows Server 2008 R2
Standard and shortly thereafter I noticed that files I downloaded from my
Windows 7 PC directly to the server using Firefox would be corrupted.  Upon
investigation I discovered that even though Firefox was telling me the download
completed the file size on the server would be too small (ex 30-50KB missing
out of 100MB), as if not all of the data successfully transferred.  After a lot
of hours of testing I discovered that if you force either the client or server
to use SMB v1 instead of SMB v2 the problem would immediately cease.

I had a Windows Vista PC that I was also testing with and even when using SMB
v2 Firefox would not corrupt the download when downloading directly to the
server.  This issue only occurs with Windows 7 clients.

Steps to reproduce:

1. Using a Windows 7 PC, download a file directly to a Windows 7/Server 2008 (mapped drive or UNC path) using SMB v2 (the default protocol).
2. Firefox reports the download completes but when you try to use the file it
is corrupt.  Larger downloads (100MB+) will corrupt every time, smaller
downloads have a chance of corrupting.  Downloading the file to your local PC
is always successful, and upon further investigation the file on the server
(corrupt) is always smaller than the file on the local PC (intact).

Workaround: Disable SMB v2 on the Windows 7 client or on the destination
server/client.

I hope this can be picked up by someone and fixed quickly, as I noticed that
the original bug was opened 4.5 months ago.  It has the potential to become a
serious problem as companies using Windows Server 2008 upgrade to Windows 7, as
many of them use mapped drives.

I can perform any further testing that is needed."

I hope we can get this bug fixed soon, as it has over 10 months since the original report in 512543.  I haven't tested with the latest 3.6.6 version, but I will try to later today.
Comment 17 krikkit 2010-07-16 12:13:27 PDT
i can confirm this bug, today even tested firefox 4.0 beta 1 and the bug is still there... tested with windows 7 x64 enterprise with fileshare on windows 7 client, but also tested with fileshare on windows server 2008 r2... can't really understand why such a bug is still not fixed 5 month after report... :(
Comment 18 Steve Harper 2010-07-17 07:17:11 PDT
It's a difficult bug to pin down. I have tried reproducing the same scenario using the same API's firfox uses but I cannot reproduce the issue. I suspect it to be an issue in the Windows 7 implementation of SMBV2. But the only known issues I can find documented on the internet relate to file locking. 

I added some logging code to Firefox and noticed that the Windows API call "SetFilePointer" would set the file pointer to zero after writing the first few 32kb chunks to a file on a remote share. Subsequent calls to write would then overwrite existing data in the file(SetFilePointer is called before every write when the PR_APPEND flag is set). In my tests removing the SetFilePointer call appeared to fix the issue. 

I created a simple javascript test using nsIWebBrowserPersist to write a file directly to a Windows 7 share, this appears to work ok, even though its using the same NSPR code.

I am really stumped with this one. If anyone has anymore info I would be happy to look into this again.
Comment 19 Patrick Donker 2010-08-09 13:51:04 PDT
Glad to see that developers are trying to fix this nasty bug.
I run Win7_64 and can also vouch that the bug is also there in FF 3.6.8
Strange thing is that corruption not always occurs; re-downloading the file a number of times *could* result in a proper download, so files are not always corrupted. 
I hope a fix will become available soon as more and more people are starting to use Win7_64 and in this state FF is close to useless (as we all sometimes need to save something from the web).
I am more than willing to test possible fixes.
Good luck y'all!
Comment 20 Björn Stahlberg 2010-08-26 23:54:31 PDT
I can confirm this bug too on Win 7 64. As an administrator i am often downloading large files such as iso images to a mapped network share. For 50MB+ files chances are nearly 100% that they get corrupted.
Comment 21 KiboOst 2010-09-07 07:39:47 PDT
Same here, win7 pro x64 and FF 3.6.8
Also network admin and as Bjorn, I just can't save anything on any network drive (have differents ones from differents 2008 servers/raid5, same problem on each one).
No problem at all with IE8.
We have around 30 machines here and I can confirm it 100% on every single machine !
Comment 22 Alex 2010-09-10 17:46:50 PDT
I can confirm here too, saving anything via FF 3.6.9 running under Win7 pro x64 greater than approx 1MB to a Windows 2008 R2 network share results in corrupted downloads. The corruption can be negated by saving locally (i.e. not to the network share), the same download saves correctly to the network share location using alternative browsers.  So FF is the confirmed cause.
Comment 23 zel 2010-09-16 13:15:50 PDT
i have just got off the phone with microsoft t/s with what appears to be a fix from window's end.  

i've tested it a few times already with positive results.

this fix is for win7.  
tested on:
-ff v3.6.3 on workstation
-win7 32 bit ultimate (workstation)
-win7 32 bit ultimate (server)

open a cmd prompt 'as administrator' and type the command:
>netsh int tcp set global autotuninglevel=disabled

this definitely needs to be run on (at least) the server windows machine.  
it is possible that it needs to be run on *both* workstation and server
(i tried it first on only the server and still experienced corruption)

if for some reason, you wish to return autotuninglevel settings to normal, the command should be (untested by me):
>netsh int tcp set global autotuninglevel=normal

hope this helps find a solution!
Comment 24 zel 2010-09-16 13:18:14 PDT
correction:

>>>
this definitely needs to be run on (at least) the WORKSTATION windows machine.  
it is possible that it needs to be run on *both* workstation and server
(i tried it first on only the server and still experienced corruption)
<<<
Comment 25 Patrick Donker 2010-09-16 13:53:36 PDT
Great! This seems to fix the problem for now :)
I had to run it on both the server and the workstation side (Server 2k8_64, Workstation W7_64)
Hopefully this will be included into some Microsoft hotfix.
Thanks for sharing Zel!!
Comment 26 Patrick Donker 2010-09-16 14:13:24 PDT
I just Googled netsh int tcp set global autotuninglevel=disabled and it appears not to fix any bugs, just altering the way the IP stack works. Hopefully it'll help the FF developers locate the problem.
Comment 27 KiboOst 2010-09-30 01:34:42 PDT
I've done the tricks and verified it is done (show globals), on both server and stations, doesn't change. FF 3.6.10 still corrupt files when saving on network. IE, chrome and safari don't have this problem.
Comment 28 Toby Douglass 2010-09-30 02:36:50 PDT
Seconded.  I also have tried this fix (netsh int, etc) on the sharing server and client and it doesn't solve the problem for me.
Comment 29 Toby Douglass 2010-10-02 03:54:07 PDT
(In reply to comment #28)
> Seconded.  I also have tried this fix (netsh int, etc) on the sharing server
> and client and it doesn't solve the problem for me.

It's just struck me that I didn't reboot the server.  This netsh change - is a reboot required for it to take effect?
Comment 30 Steve Allison 2010-10-02 04:06:08 PDT
(In reply to comment #29)
> It's just struck me that I didn't reboot the server.  This netsh change - is a
> reboot required for it to take effect?

No.
Comment 31 Patrick Donker 2010-11-18 13:02:12 PST
Apparently the fix as suggested by Zel was only a temporary one. Downloads to a net share still get borked :(
Comment 32 Patrick Donker 2010-11-18 13:03:51 PST
Oh, and another thing; I am running FF 4.0b7 and corruption occurs with this version as well.
Comment 33 KiboOst 2010-11-19 00:06:40 PST
WOW ! Even FF4 have this bug ?? Is the official team, the developers, are aware of this ? I have problem everyday with this, telling people to NOT download to network drives we all work on, but on there desktop, then to move it to the right place on the network. Really annoying. This bug alone could make me switch to chrome or safari for everyone !! And will, if even FF4 is not fixed. Does anyone is in relation with developers of FF ? And why does the status in 'unconfirmed' ? It is well confirmed with everyone working on a network ! Domain or not !
Comment 34 Steve Allison 2010-11-19 00:39:06 PST
This bug used to drive me crazy at work. I sent a memo out explaining that FF had this bug yet I still got support calls for it.

In the end I added a GPO to block firefox.exe from being executed.

I think until this is worked on, FF has no place in Enterprise environments.
Comment 35 Björn Stahlberg 2010-11-19 01:04:36 PST
Jep, files still gettin corrupted here. This is really annoying and reduces productivity. I can't understand this bug still exists.
Comment 36 Tyler Retelle 2010-11-19 06:24:00 PST
While this bug is extremely annoying and should have been fixed by now (its been a year and a half from the original bug on this) I did post a workaround quite a while ago.  See Comment 16 for details.

All you need to do is disable SMB v2 on the client or workstation/server you are downloading to.  I did this on my Windows Server 2008 R2 server and have been running it that way ever since with no problems.  This is valid for Windows Server 2008/Windows 7.

Instructions:

http://www.petri.co.il/how-to-disable-smb-2-on-windows-vista-or-server-2008.htm
Comment 37 KiboOst 2010-11-19 06:39:14 PST
Are you serious ? So the solution is to disable the new standard SMB which is use by default on all new server and workstation operating systems by microsoft ? A SMB2 which is faaar faster than SMB1 (we use files around 500Mo every minutes !!!) ? It could be a few days workaround but not much, we won't slow down every transfert for a browser bug, this is ridiculous.
I love FF, for years now, but since this bug is there for more than a year and several fix versions, I have no choice but look at other browsers. Shame.
Comment 38 Tyler Retelle 2010-11-19 07:03:23 PST
I never said anything about the workaround not being ugly, inconvenient, or impossible in certain environments.

I admit that it is ridiculous that you have to disable SMB v2 to stop the corruption, but that is the only "fix" I have found for this short of a developer finally fixing this long standing bug.  The other alternative is to use another browser, and that is your choice.

By the way, it would help if everyone on this list would vote for this bug.  With more votes, hopefully this will get some attention at last.
Comment 39 Attila Mesterhazy 2010-11-25 01:10:18 PST
We recently had to change from Samba 3 to Windows Server 2008 R2 on the server. Since than I alway get corrupted files when I try to download something directly to the server. I tried both FF 3.6.12 and FF 4.0b7 but always get the same result - corrupted files.
If I use IE8 to download the file directly to the server the file is OK.
Please! Someone try to fix it!
Comment 40 Attila Mesterhazy 2010-11-25 01:13:47 PST
Sorry, I forogt to write: the server operating system is Windows Server 2008 R2 x64, and the clients run Windows 7 x86 and x64.
Comment 41 Patrick Donker 2010-11-28 07:15:20 PST
Finally this bug has been picked up by the developers (or so it seems)
Comment 42 Renarde 2010-11-29 19:44:39 PST
I can confirm that zel's temporary fix, 

open a cmd prompt 'as administrator' and type the command:
>netsh int tcp set global autotuninglevel=disabled


does work for saving files to a mapped network drive, using Firefox 3.6.12. I'm downloading the files on a Win7x64 machine, and saving them on a Vistax86 machine.

The fact that this bug has existed for nearly two years is outrageous, and I will no longer recommend Firefox as a browser.
Comment 43 WADA 2010-11-29 22:48:45 PST
(In reply to comment #23)
> i have just got off the phone with microsoft t/s with what appears to be a fix
> from window's end.  
> i've tested it a few times already with positive results.
> open a cmd prompt 'as administrator' and type the command:
> >netsh int tcp set global autotuninglevel=disabled

It's command shown in "How to disable the Receive Window Auto-Tuning feature" section of MS KB #947239.
> http://support.microsoft.com/kb/947239
The feature is for TCP Window Scaling as receiving side of TCP. See documents listed in bug 566548 comment #44, please. Next is a part of the comment.
> (d) http://support.microsoft.com/kb/934430 (expnad "More Information" section)
> Because Vista enabled "TCP Window Scaling" as receiver side by default,
> similar problem to bug 541367 comment #17 looks to have happened with many routers.
It's similar problem of router to bug 541367/bug 566548(Receiver:SMTP, Sender=Tb). Win Vista case is : Receiver of TCP=Win Vista, Sender of TCP=other PC.

I think server OS/Networking which is used by SMBV2 server usually supports "TCP Window Scaling as both Receiver of TCP and Sender of TCP" relatively for long time, but "Windows TCP Scaling as Receiver of TCP on client PC" is very new.
So, if problem occurs in any enviroment of "SMBV2 server + SMBV2 client on Vista/Win 7" and if "disable Receive Window Auto-Tuning feature" is effective workaround, I guess it's problem at "SMBV2 client as Receiver of TCP"(e.g. overflow in SMBV2 client code) when "SMBV2 client as Receiver" advertised big TCP Window Size and "SMBV2 server as Sender of TCP" sent data using the advertised big TCP Window Size, although problem in "SMBV2 server as Receiver of TCP" is not ruled out yet.

In bug 566548(Receiver:SMTP, Sender=Tb), we worked around router's bug by limiting maximum send bufer size by network.tcp.sendbuffer=65536(default from Tb 3=131072, SMTP server advertises 128KB or larger TCP Window Size).
If my guess is not absolutely wrong, I think "disable Receive Window Auto-Tuning feature at SMBV2 client" is sufficient/reasonable workaround.

For Fx, file at SMBV2 server is similar to local file and Fx merely requests OS to read or write data from a file. After that, it's up to OS and SMB code.
If Fx can do something, I think it's "smaller read/write buffer size for file I/O" only, because problem of SMBV2 when TCP Window Scaling is used, although appplication code change to bypass bug of OS or others is tried in some cases.
Comment 44 Benjamin Smedberg [:bsmedberg] 2010-12-10 08:57:36 PST
Not a regression from Firefox 3.6, and it doesn't sound like a hugely common problem, so not a blocker. I'd like to see an NSPR-only fix.
Comment 45 Nelson Bolyard (seldom reads bugmail) 2010-12-10 10:36:42 PST
Bug 4090 is marked fixed.  There were subsequent improvements made to it 
for >2GB files.  See bug 4090 for the relevant patches.

All the evidence here suggests that this is a bug in SMB2, for which the 
workaround is to add extra otherwise-superfluous seeks to EOF before each
write.  

I'd be happy to entertain a new NSPR patch to workaround this Win7 bug in NSPR
as long as it's clear that THAT's what the change does.
Comment 46 Wan-Teh Chang 2010-12-10 12:12:29 PST
(In reply to comment #10)
>
> I do not understand why the current code seeks to eof before every write.

PR_APPEND is modeled after the Unix O_APPEND flag, and
the specification of O_APPEND says "the file offset shall
be set to the end of the file prior to each write.".  See
http://www.opengroup.org/onlinepubs/9699919799/functions/open.html

An application that doesn't call PR_Seek(64) between
PR_Write calls can work around this problem by opening
the file without the PR_APPEND flag, and calling
PR_Seek64 to seek to the end of file before writing
(if the file is not brand new).
Comment 47 WADA 2010-12-10 18:46:31 PST
(In reply to comment #46)
> "the file offset shall be set to the end of the file prior to each write.".  

AFAIK, "open write with append mode" is explained like next in Rexx, PHP etc.
  - After open, current position is placed at EOF if existent file.
  - Write position is placed at next of written data after write operation.
    (this may be applicable to replace mode too)
This is same as O_APPEND on "next write", but difference from O_APPEND may exist in "current position" after open/write or before write.

On Win, append type file writing is simply done like next in many cases.
  Open write with append mode, simply repeat write.
  - Doesn't care for/no need to care for position before/after write.
Does NSPR's position handling rely on current position or next write position returned by OS after write operation? Next in comment #3 is relevant to it?
> it looks like this error occurs due to SetFilePointer returning 0
> for the end of file after correctly writing a sequence of 32k chunks to the file.
If so, it may explain "Mozilla only problem". Other applications on Win simply repeat write requests, so they don't meet SMB2's bug.. 

(In reply to comment #45)
> workaround is to add extra otherwise-superfluous seeks to EOF before each write.

If it's not called "seek to EOF" and if it's called "seek to next write position", and if "open write replace" instead of "open write append", it's not superfluos. It's ordinal coding, although this kind of seek is usually not needed in ordinal environment for append type write operation.
  Open write with replace mode,
  seek to X1, write data of length=L1 at offset X1(replace if data exists),
  seek to X2, write data of length=L2 at offset X2(replace if data exists), ...
  "Append to file" is merely "X1 == EOF after file open" case.
  "Append type write" is merely special case of "X(n)=X(n-1)+L(n-1)" or
  "X(n)=EOF after each previous write".
  This kind of logic is perhaps usable even with append mode write open.
For "brand new" file, "open write & close" before above is sufficient.
Workaround may be called "back to basic mode", not "excess seek" :-)
Comment 48 Steve Harper 2010-12-31 13:00:29 PST
This change cannot be made in NSPR as nsLocalFileWin is used to open a file for IO on windows. The OpenFile function calls PR_ImportFile passing in the file handle returned from CreateFile.

I think the change needs to be made when the file is first opened (in nsLocalFileWin). Adding something along the following lines after the call to CreateFile:

if (osflags & PR_APPEND)
  MyFileSeek64(file, 0, FILE_END);

Inorder to stop a seek after every write on the windows platform, prfile.c in NSPR would need changing to 

#if !defined(_PR_HAVE_O_APPEND) && !defined(WIN32)  /* Bugzilla: 4090, 276330 */
    if (fd->secret->appendMode) {
        if (PR_Seek64(fd, 0, PR_SEEK_END) == -1) {
            return -1;
        }
    } /* if (fd->secret->appendMode...) */
#endif /* _PR_HAVE_O_APPEND */
Comment 49 Ryan 2010-12-31 23:20:52 PST
I had the same problem. zel's fix seems to work for me.  I applied it to the client pc only.  both systems are Windows 7 64 bit professional with firefox 3.6.13. 

the problem seemed to always affect zip files and quite often exe's.  I didn't really notice it only other file types, but maybe i just never saved them to a network drive.
Comment 50 Kevin Brosnan 2011-01-07 08:34:03 PST
*** Bug 621141 has been marked as a duplicate of this bug. ***
Comment 51 Attila Mesterhazy 2011-01-20 02:03:37 PST
Unfortunatelly for me zel's fix doen't work.
I tried it on a Win7 32 bit Enterprise client connected to a Win 2008 R2 64 bit server. I entered the following command:
netsh int tcp set global autotuninglevel=disabled
than restarted the computer.
But I still get corrupted EXE files if I download directly to a mapped network drive.
Comment 52 Attila Mesterhazy 2011-01-25 02:11:50 PST
Ok, what worked for me is to disable SMBv2 on the client machine. I tried it with Firefox 4.0 beta 9.
This is how I disabled SMBv2:
- started a Command prompt as administrator
sc config lanmanworkstation depend= bowser/mrxsmb10/nsi
sc config mrxsmb20 start= disabled

After restarting the computer the downloaded files to a mapped network share are not corrupted.
Comment 53 Patrick Donker 2011-01-25 02:18:52 PST
I still think that inventing all sorts of tricks to make FF work on W7-64 / 2k8 is unacceptable.
Comment 54 Kroby 2011-01-30 10:25:56 PST
I too am having this problem. I am running Firefox 3.6.13 on Windows 7 Professional 64 bit.  The file server is running Windows Server 2008 64 bit, and is attached as a mapped drive.  Saving large ISO files directly to the network share results in a corrupt file.  This was verified by checking the md5 check sum.  

If I download to a local drive there is no corruption.  Also, if I download the same file to the mapped drive using Internet Explorer there is no corruption.  This leads me to believe that this is indeed a problem with Firefox.  

Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13
Comment 55 WADA 2011-01-30 16:25:27 PST
(In reply to comment #51)
> Unfortunatelly for me zel's fix doen't work.
> I tried it on a Win7 32 bit Enterprise client connected to a Win 2008 R2 64 
bit server.

Following is zel's comment in comment #23.
> this definitely needs to be run on (at least) the server windows machine.  
> it is possible that it needs to be run on *both* workstation and server
> (i tried it first on only the server and still experienced corruption)

Attila Mesterhazy, can you change setting of the "Win 2008 R2 64 bit server"?
If possible, can you check next?
             Win PC/SMB2 client  Win server/SMB2 server  Can be a workaround?
  zel's fix  not-applied         not-applied             N/A, original issue
  zel's fix  not-applied         applied                 No (zel's case)
  zel's fix  applied             no-applied              No (your case)
  zel's fix  applied             applied                 ?

(In reply to comment #49)
> I had the same problem. zel's fix seems to work for me.  I applied it to the
> client pc only.  both systems are Windows 7 64 bit professional with firefox
> 3.6.13. 
(In reply to comment #51)
> Unfortunatelly for me zel's fix doen't work.
> I tried it on a Win7 32 bit Enterprise client connected to a Win 2008 R2 64 
bit server.

It sounds difference of Server OS version.

KB976932 : for SP1 beta. 
> http://support.microsoft.com/?kbid=976932
> Article ID: 976932 - Last Review: July 13, 2010 - Revision: 2.0
> Information about Service Pack 1 Beta for Windows 7 and for Windows Server 2008 R2
SP1 Release Candidate.
> http://www.microsoft.com/downloads/en/details.aspx?FamilyID=c3202ce6-4056-4059-8a1b-3a9b77cdfdda
> Windows 7 and Windows Server 2008 R2 Service Pack 1 Release Candidate (KB976932)
> Version: 976932
> Date Published: 10/26/2010

Is official SP1 already available? Is SP1 installed in your server?
Comment 56 Steve Allison 2011-01-30 16:59:01 PST
SP1 has no new features or updates, it is simply a container of all fixes since the OS release.

From the page you linked: "SP1 Beta contains previously released fixes that cover specific reliability, performance, and compatibility issues."

Zel's advice may fix the issue or not. But what we haven't found yet is why it *ONLY AFFECTS FIREFOX*. Opera, Chrome, IE, Free Download Manager, Reget etc. etc. all function fine over SMBv2 and new TCP/IP stack.

Firefox is doing something. It is either doing something wrong, or it is triggering a bug that the others don't trigger.

As for your question as to what combinations trigger the issue, I will perform the tests over the coming days at my home network and send the results here.
Comment 57 WADA 2011-01-30 18:01:46 PST
(In reply to comment #56)
> why it *ONLY AFFECTS FIREFOX*. Opera, Chrome, IE, Free Download Manager,
> Reget etc. etc. all function fine over SMBv2 and new TCP/IP stack.

Please see Comment #3 by Steve Harper.
> SetFilePointer returning 0 for the end of file after correctly writing
> a sequence of 32k chunks to the file.
As Mozilla requests "seek to EOF" before write(or after previous write) and as "SetFilePointer to EOF" returns ZERO if SMB2(i.e. write position is set to ZERO by OS)", next write is always executed to offset=ZERO.
It's probably bug of SMB2 but may be design or restliction of caching by SMB2.
As I wrote before, "seek to EOF" for each write is not popular coding on Win when append mode write is used for file I/O. I guess it's reason why Firefox/Thunderbird only issue. 

> Firefox is doing something. It is either doing something wrong, or it is
> triggering a bug that the others don't trigger.

Comment #48 by Steve Harper is probably to skip "set write position to EOF" if Win, in order to bypass SMB2's bug if it's real bug of SMB2.

FYI.
Following is a page found by Google search for "windows smb2 setfilepointer eof".
> http://social.msdn.microsoft.com/Forums/en/os_fileservices/thread/832d395b-6e6f-4658-8dbb-120138a4cd7c
SMB2's problem of this article found by Pervasive PSQL is;
> The FindFirstFile returns false even though the file exists but if we wait
> for approximately 6 seconds   the result from FindFirstFile is true.
Next registry key looks relevant to the problem. 
> HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\LanmanWorkStation\Parameters
> FileInfoCacheLifetime      Default value: 10 (seconds)
> FileNotFoundCacheLifetime  Default value:  5 (seconds)
> DirectoryCacheLifetime     Default value: 10 (seconds)
> References:
> [MS-SMB2]: Server Message Block (SMB) Version 2 Protocol Specification
> http://msdn.microsoft.com/en-us/library/cc246482(PROT.13).aspx
> 3.2.4.17 Application Requests Enumerating a Directory
> http://msdn.microsoft.com/en-us/library/cc246600(PROT.13).aspx
> (For behavior as LAN server, LanManServer looks used)
> HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\lanmanserver

"EOF position or next write position after write with append mode" is perhaps categorized in FileInfo, so registry key of FileInfoCacheLifetime may be relevant to this bug, in addition to "TCP Window Scaling".
> Value type: REG_DWORD
> Value name: FileInfoCacheLifetime
> Default:    10 (seconds)
Comment 58 Steve Harper 2011-01-31 11:49:40 PST
I think other browsers just write to a temp file on your local drive and then copy it to the specified location after. Firefox will write to the remote drive directly and this is why we get issues with SMB2. It does have something to do with the specific usage Scenario in firefox as I cannot reproduce the defect in Windows with the same API calls. The fix I talked about in my last post does address the issue but is not an acceptable fix as far as the module owners are concerned. I know whats causing the issue but I dont know why. Maybe it has something to do with the new SMB2 caching but I assume this would only be a problem if two clients were read/writing to the file at the same time. It would help if someone could test the registry keys posted above. If it does turn out to be a problem with caching maybe we can find a more acceptable solution.
Comment 59 Wan-Teh Chang 2011-01-31 14:57:07 PST
I didn't realize this bug is assigned to me.  Perhaps
that's why nothing has happened.  Can a Mozilla driver
reassign this bug to a real Mozilla developer?

This bug should take no more than two hours to fix if
the right people look into it.  Unfortunately I am
not familiar with the file download code in Mozilla.

If someone can send me the call stack when the file
download code calls the PR_Open (or PR_OpenFile) and
FileWrite functions in nsprpub/pr/src/io/prfile.c,
I can take a stab at it.
Comment 60 Shawn Wilsher :sdwilsh 2011-01-31 15:12:17 PST
(In reply to comment #59)
> If someone can send me the call stack when the file
> download code calls the PR_Open (or PR_OpenFile) and
> FileWrite functions in nsprpub/pr/src/io/prfile.c,
> I can take a stab at it.
Myself or bz would probably be the most qualified there, but not sure when I'll have cycles to do that.
Comment 61 Patrick Donker 2011-01-31 15:21:49 PST
Hopefully you will find the time to fix this nasty bug because for me it will be a definite show-stopper. It really should be fixed before I will distribute FF in the enterprise environment I administer, and I guess many will agree with me.
Comment 62 Steve Harper 2011-01-31 15:35:10 PST
OpenFile(const nsString & name={...}, int osflags=18, int mode=384, PRFileDesc * * fd=0x0038bd84) Line 390
nsLocalFile::OpenNSPRFileDesc(int flags=18, int mode=384, PRFileDesc * * _retval=0x0038bd84) Line 961
nsFileOutputStream::Init(nsIFile * file=0x0a480718, int ioFlags=18, int perm=384, int behaviorFlags=0) Line 399
NS_NewLocalFileOutputStream(nsIOutputStream * * result=0x0038bed4, nsIFile * file=0x0a480718, int ioFlags=18, int perm=384, int behaviorFlags=0) Line 974
nsExternalAppHandler::SaveToDisk(nsIFile * aNewFileLocation=0x00000000, int aRememberThisPreference=0) Line 2180
nsExternalAppHandler::OnStartRequest(nsIRequest * request=0x0852dc18, nsISupports * aCtxt=0x00000000) Line 1658

_PR_MD_WRITE(PRFileDesc * fd=0x07c4f1d8, const void * buf=0x08551448, int len=32768) Line 520
FileWrite(PRFileDesc * fd=0x07c4f1d8, const void * buf=0x08551448, int amount=32768) Line 109
PR_Write(PRFileDesc * fd=0x07c4f1d8, const void * buf=0x08551448, int amount=32768) Line 146
nsFileOutputStream::Write(const char * buf=0x08551448, unsigned int count=32768, unsigned int * result=0x0038c2d8) Line 418
nsBufferedOutputStream::Flush() Line 558	C++
nsBufferedOutputStream::Write(const char * buf=0x0853dbe8, unsigned int count=32766, unsigned int * result=0x0038c3d0) Line 541
nsExternalAppHandler::OnDataAvailable(nsIRequest * request=0x0852dc18, nsISupports * aCtxt=0x00000000, nsIInputStream * inStr=0x0a7fa2f8, unsigned int sourceOffset=0, unsigned int count=720898) Line 1811
Comment 63 Shawn Wilsher :sdwilsh 2011-01-31 16:07:54 PST
(In reply to comment #62)
There is also the case where we resume a download at restart, which hits this code:
https://hg.mozilla.org/mozilla-central/annotate/0a47be5cdd94/embedding/components/webbrowserpersist/src/nsWebBrowserPersist.cpp#l824
Comment 64 Wan-Teh Chang 2011-01-31 17:31:06 PST
Comment on attachment 442882 [details] [diff] [review]
A patch to set the file pointer to the end of file when its opened and the PR_APPEND flag is set

Steve, Shawn: thank you for the call stacks in comment 62 and comment 63.
With that info, I think Steve's change to xpcom/io/nsLocalFileWin.cpp
is a good fix for this bug.

The change to nsprpub/pr/src/io/prfile.c in this patch won't help
this Mozilla file download bug.  (The change is incorrect anyway.)
Also, since the behavior of PR_APPEND is documented, we may not be
able to change it easily.

I suggest two changes.

1. Move the following code
   
>+	if (osflags & PR_APPEND) {
>+        if ( SetFilePointer(file, 0, 0, FILE_END) == 0xFFFFFFFF ) {
>+            CloseHandle(file);
>+            return -1;
>+        }
>+    }

to the front of

>     *fd = PR_ImportFile((PROsfd) file);

2. Remove lines 342-375:

http://mxr.mozilla.org/mozilla-central/source/xpcom/io/nsLocalFileWin.cpp#342

It is BAD to depend on NSPR internals like this!  Fortunately Steve's
change eliminates the dependency on NSPR's PRFilePrivate structure as
a byproduct.
Comment 65 Wan-Teh Chang 2011-01-31 17:42:39 PST
Two more comments:

1. MSDN sample code "Appending One File to Another File"
http://msdn.microsoft.com/en-us/library/aa363778(v=vs.85).aspx
shows calling SetFilePointer before each WriteFile call.
This means NSPR's implementation of PR_APPEND should be
correct, unless we also need the LockFile/UnlockFile calls.

Anyone want to volunteer to submit a bug report to Microsoft?

2. In the MSDN CreateFile page
http://msdn.microsoft.com/en-us/library/aa363858(v=vs.85).aspx
there is a user comment "Opening File for APPEND access"
that suggests passing FILE_APPEND_DATA as the second argument
to CreateFile.  (See also
http://msdn.microsoft.com/en-us/library/ff548289.aspx )
We can also try that.
Comment 66 Wan-Teh Chang 2011-01-31 18:03:37 PST
I found yet another way to implement append mode:
http://msdn.microsoft.com/en-us/library/aa365747(VS.85).aspx

  To write to the end of file, specify both the Offset and
  OffsetHigh members of the OVERLAPPED structure as 0xFFFFFFFF.
  This is functionally equivalent to previously calling the
  CreateFile function to open hFile using FILE_APPEND_DATA access.

This solution will require xpcom/io/nsLocalFileWin.cpp to
depend on NSPR's PRFilePrivate structure though.  So
opening the file using FILE_APPEND_DATA access is still a
better solution
Comment 67 Patrick 2011-02-01 01:11:52 PST
I have no idea what all the new code and stuff that has been posted is, but does this mean a fix is coming soon?
Comment 68 WADA 2011-02-01 03:15:13 PST
(In reply to comment #65)
> Anyone want to volunteer to submit a bug report to Microsoft?

Wan-Teh Chang, can you explain "which SetFilePointer call in which source module called by which moule's which call that is sheduled at which thread/task at which timing" produced ZERO offset by SetFilePointer with SMB2?
Is "multiple clients were read/writing to the file at the same time." and/or "FileInfoCacheLifetime of SMB2" relevant to the issue? 

Steve Harper, can you submit bug report to Microsoft?
Comment 69 Steve Harper 2011-02-01 06:12:01 PST
I can understand the reluctance in adding this hack to firefox, as I'm almost certain the bug is in windows 7 and the client side smb redirector. If i submit this issue to Microsoft then I need to be able to reproduce the issue consistantly with a small proof of concept App. I have been unable to do this and the reason this is happening in firefox is a mystery to me. I don't think this issue is related to the caching problem that's already known about.
Comment 70 Patrick Donker 2011-02-01 06:13:30 PST
Is there a way we can help in pinpointing the possible 7 bug?
Comment 71 Wan-Teh Chang 2011-02-01 08:48:18 PST
Created attachment 508780 [details] [diff] [review]
Proposed patch

Please review and test this patch.  This is the best
solution because it also solves the atomicity problem.
If this doesn't work, I'll use Steve Harper's SetFilePointer
solution.

1. Implement append mode by opening the file with FILE_APPEND_DATA
but without FILE_WRITE_DATA.
2. Remove PR_APPEND when we open a file read-only.

The NSPR internal code I remove was added in bug 331453:
http://bonsai.mozilla.org/cvsview2.cgi?diff_mode=context&whitespace_mode=show&file=nsLocalFileWin.cpp&branch=1.186&root=/cvsroot&subdir=mozilla/xpcom/io&command=DIFF_FRAMESET&rev1=1.153&rev2=1.154

Although jshin and darin asked for my opinion, I was on
bereavement leave and didn't read the email after I
returned to work.
Comment 72 Shawn Wilsher :sdwilsh 2011-02-01 09:31:03 PST
Comment on attachment 508780 [details] [diff] [review]
Proposed patch

I'm going to defer this to someone who knows windows APIs better (either jimm or robstrong)
Comment 73 Steve Harper 2011-02-02 01:19:09 PST
I will take a look at this patch and test it as soon as I get the oppertunity.

Cheers,
Steve
Comment 74 Attila Mesterhazy 2011-02-03 05:59:46 PST
(In reply to comment #55)
> (In reply to comment #51)
> > Unfortunatelly for me zel's fix doen't work.
> > I tried it on a Win7 32 bit Enterprise client connected to a Win 2008 R2 64 
> bit server.
> 
> Following is zel's comment in comment #23.
> > this definitely needs to be run on (at least) the server windows machine.  
> > it is possible that it needs to be run on *both* workstation and server
> > (i tried it first on only the server and still experienced corruption)
> 
> Attila Mesterhazy, can you change setting of the "Win 2008 R2 64 bit server"?
> If possible, can you check next?
>              Win PC/SMB2 client  Win server/SMB2 server  Can be a workaround?
>   zel's fix  not-applied         not-applied             N/A, original issue
>   zel's fix  not-applied         applied                 No (zel's case)
>   zel's fix  applied             no-applied              No (your case)
>   zel's fix  applied             applied                 ?
> 
> (In reply to comment #49)
> > I had the same problem. zel's fix seems to work for me.  I applied it to the
> > client pc only.  both systems are Windows 7 64 bit professional with firefox
> > 3.6.13. 
> (In reply to comment #51)
> > Unfortunatelly for me zel's fix doen't work.
> > I tried it on a Win7 32 bit Enterprise client connected to a Win 2008 R2 64 
> bit server.
> 
> It sounds difference of Server OS version.
> 
> KB976932 : for SP1 beta. 
> > http://support.microsoft.com/?kbid=976932
> > Article ID: 976932 - Last Review: July 13, 2010 - Revision: 2.0
> > Information about Service Pack 1 Beta for Windows 7 and for Windows Server 2008 R2
> SP1 Release Candidate.
> > http://www.microsoft.com/downloads/en/details.aspx?FamilyID=c3202ce6-4056-4059-8a1b-3a9b77cdfdda
> > Windows 7 and Windows Server 2008 R2 Service Pack 1 Release Candidate (KB976932)
> > Version: 976932
> > Date Published: 10/26/2010
> 
> Is official SP1 already available? Is SP1 installed in your server?

Unfortunatelly I do not have access to the Windows server, so I can only test client side solutions.
Comment 75 Wan-Teh Chang 2011-02-03 11:44:35 PST
Comment on attachment 508780 [details] [diff] [review]
Proposed patch

sdwilsh: can you push my patch to a try server and tell people
where to download the try server build?  This will enable more
people affected by this bug to test my patch.  Thanks.
Comment 76 Shawn Wilsher :sdwilsh 2011-02-03 16:06:26 PST
(In reply to comment #75)
> sdwilsh: can you push my patch to a try server and tell people
> where to download the try server build?  This will enable more
> people affected by this bug to test my patch.  Thanks.
Pushed to http://tbpl.mozilla.org/?tree=MozillaTry&rev=9664bb328e5b (builds only; no tests).

Builds will start to show up here:
https://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/sdwilsh@shawnwilsher.com-9664bb328e5b
(it's a 404 until the first build is done)
Comment 77 Patrick Donker 2011-02-03 23:47:05 PST
I have downloaded firefox-4.0b12pre.en-US.win32.installer.exe to a local drive, installed it, ran it, downloaded firefox-4.0b12pre.en-US.win32.installer.exe again, but this time to a network share and guess what....it WORKS!
Looks like the new code solved the issue.
Thanks guys :)
Comment 78 Wan-Teh Chang 2011-02-04 07:23:16 PST
Thank you for the test report!

Please also test the "Pause" and "Resume" buttons for downloads.

To avoid posting too many comments to this bug report, please
email me your test reports.
Comment 79 Attila Mesterhazy 2011-02-04 13:48:19 PST
(In reply to comment #78)

I sent you a report but wanted to thank you for this fix. Can you tell us if beta 11 will include this fix?
Comment 80 Shawn Wilsher :sdwilsh 2011-02-04 13:49:01 PST
(In reply to comment #79)
> I sent you a report but wanted to thank you for this fix. Can you tell us if
> beta 11 will include this fix?
The earliest this could make it in would be beta 12.
Comment 81 Wan-Teh Chang 2011-02-04 20:05:47 PST
Comment on attachment 508780 [details] [diff] [review]
Proposed patch

Here are some notes for code reviewers.  This patch contains
the following changes.

1. Don't use PR_APPEND with PR_RDONLY (read-only mode) because
PR_APPEND is relevant only to writes.

2. Remove the duplication of private NSPR type definitions,
used by the original implementation of append mode.

3. Change GENERIC_READ to FILE_GENERIC_READ, and
          GENERIC_WRITE to FILE_GENERIC_WRITE.

For files, GENERIC_WRITE is equivalent to FILE_GENERIC_WRITE,
but FILE_GENERIC_WRITE is a bitwise OR of the actual access
bits and allows us to turn off the FILE_WRITE_DATA access bit.

4. Implement append mode by turning off the FILE_WRITE_DATA
bit in the FILE_GENERIC_WRITE bitmask.  This is documented in
http://msdn.microsoft.com/en-us/library/gg258116(v=vs.85).aspx:

  FILE_APPEND_DATA  For a file object, the right to append
  4                 data to the file. (For local files, write
                    operations will not overwrite existing
                    data if this flag is specified without
                    FILE_WRITE_DATA.)

Note it says "local files".  I am worried that this won't
work for files on network drives (remote files).

5. Delete the original implementation of append mode, which
is not atomic, and is affected by this presumed Windows 7
SMB2 bug.
Comment 82 WADA 2011-02-04 20:46:19 PST
(In reply to comment #81)
>   FILE_APPEND_DATA  For a file object, the right to append
>   4                 data to the file. (For local files, write
>                     operations will not overwrite existing
>                     data if this flag is specified without
>                     FILE_WRITE_DATA.)
> Note it says "local files".  I am worried that this won't
> work for files on network drives (remote files).

FYI.
If issue exists with "FILE_APPEND_DATA without FILE_WRITE_DATA" and network file and SMB2, "FILE_WRITE_TO_END_OF_FILE by WriteFile" may be a solution, with minimum changes because any SEEK/SetFilePointer can't affect on write position if FILE_WRITE_TO_END_OF_FILE, without changing current implementation/documentation around PR_APPEND on Win.
It may be also a simple solution of "not atomic" issue in original implementation of append mode, unless it's also affected by presumed SMB2 bug. 

> http://msdn.microsoft.com/en-us/library/aa365747(v=vs.85).aspx#2
> WriteFile Function
>
> Appending data automatically
> You can use the data for this write to be written at the EOF (rather than the
> current file pointer) by using the OVERLAPPED structucture and specifying
> OVERLAPPED.Offset = FILE_WRITE_TO_END_OF_FILE and OVERLAPPED.OffsetHigh = -1
>(snip)
Unfortunately, FILE_WRITE_TO_END_OF_FILE is not currently in the platform SDK headers, but it is in the WDK headers:
> #define FILE_WRITE_TO_END_OF_FILE       0xffffffff
> I believe this will be automatic with any other append operations to the file
> (regardless of whether such operations are on the same or a different file
> handle.) This is a much better solution than the classic seek-to-eof and
> write that I often see in code.
Comment 83 Wan-Teh Chang 2011-02-04 21:52:21 PST
WADA: thank you for the comment.

The reason I want to try "FILE_APPEND_DATA without FILE_WRITE_DATA"
first is that it allows our WriteFile() call to update the file
pointer.  (Some callers of PR_Write() expect it to update the file
pointer.)

If I use "FILE_WRITE_TO_END_OF_FILE by WriteFile", the WriteFile()
call is not documented to update the file pointer.  The WriteFile
MSDN page says:

  * If lpOverlapped is NULL, the write operation starts at the
    current file position and WriteFile does not return until the
    operation is complete, and the system updates the file pointer
    before WriteFile returns.
  * If lpOverlapped is not NULL, the write operation starts at
    the offset that is specified in the OVERLAPPED structure and
    WriteFile does not return until the write operation is
    complete. The system updates the OVERLAPPED offset before
    WriteFile returns.

I just did an experiment and found that "FILE_WRITE_TO_END_OF_FILE
by WriteFile" also updates the file pointer.  (Moreover, it does
NOT update the OVERLAPPED offset, contrary to what the MSDN page
says.)  So this solution is also acceptable.  The only disadvantage
is that it requires more changes to NSPR.
Comment 84 WADA 2011-02-04 23:07:35 PST
(In reply to comment #81)
> 5. Delete the original implementation of append mode, which
> is not atomic, and is affected by this presumed Windows 7
> SMB2 bug.

Wan-Teh Chang, this bug's problem with SMB2 is next?
(1) File is opened with FILE_WRITE_DATA + FILE_APPEND_DATA by Tb.

    Note:
    MS Win's "open write with append flag" was different from my thought.
    "Open with FILE_APPEND_DATA, without FILE_WRITE_DATA" was required
    to force MS Win next: (implementation of "append mode" by PHP is next)
       Open with pure append mode. ("open write with append", if PHP)
       Always add data to EOF position, regardless of current position.
       SEEK doesn't affect on write position, if opened with apped mode.
    Your current patch produces similar behaviour of Mozilla to PHP's
    "open write with append mode".

    MS Win's FILE_APPEND_DATA flag seems similar to Unix O_APPEND flag.
    MS perhaps added behaviour of "FILE_APPEND_DATA, no FILE_WRITE_DATA" mode
    after initial implementation of FILE_APPEND_DATA, for ease of programing.
    And, MS perhaps added FILE_WRITE_TO_END_OF_FILE, for ease of
    "append mode" file writing by applications, including MS's applications,
    without restrictions by "FILE_APPEND_DATA, no FILE_WRITE_DATA" mode
    such that "seek to non EOF and rewrite data" is impossible.

(2) Different file handles are used by; (== "not atomic" you call)
    (a) Initial data writing to file by FileWrite.
    (b) Asynchrnous data append when additional data is available by HTTP GETs. 
(3) Win/SMB2 somehow sets write position to ZERO by SetFilePonter(to EOF)
    when requested by (b).
Comment 85 Jim Mathies [:jimm] 2011-02-05 09:50:30 PST
nit - can we wrap this to 80 chars:

+        PRFileDesc* file = PR_Open(mWorkingPath.get(), PR_RDONLY | PR_CREATE_FILE | PR_EXCL, attributes);

r+ from me with that change.
Comment 86 Wan-Teh Chang 2011-02-05 10:39:58 PST
Created attachment 510033 [details] [diff] [review]
Proposed patch v2

I folded the long line in xpcom/io/nsLocalFileOS2.cpp
to 80 characters as jmathies requested.
Comment 87 Wan-Teh Chang 2011-02-05 10:44:29 PST
A thread on java.openjdk.core-libs.devel mentioned problems
with the "FILE_APPEND_DATA but not FILE_WRITE_DATA" solution:
http://osdir.com/ml/java.openjdk.core-libs.devel/2008-06/msg00010.html

  Unfortunately, the removal of FILE_WRITE_DATA permissions
  causes file locking and file truncation to fail.

So we should investigate if Mozilla locks or truncates the
files opened with PR_APPEND.
Comment 88 Steve Harper 2011-02-06 09:28:06 PST
Created attachment 510126 [details] [diff] [review]
Patch to set file pointer to EOF when opening file for first time

I downloaded the "firefox-4.0b12pre.en-US.win32.installer.exe" from the test server but still encountered problems downloading the same file multiple times. When I downloaded a file the first time the file was ok but every time after that the file was corrupt on the Windows 7 share. I have uploaded a patch that appears to work 100% of the time. The patch incorporates the changes suggest by Wan-Teh Chang.
Comment 89 WADA 2011-02-06 18:27:43 PST
(In reply to comment #88)
> I downloaded the "firefox-4.0b12pre.en-US.win32.installer.exe" from the test
> server but still encountered problems downloading the same file multiple times.

Checked using next try server build ponted by Shawn Wilsher in Comment #76,
> https://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/sdwilsh@shawnwilsher.com-9664bb328e5b
and executed download test of next zip file?
> http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-trunk/firefox-4.0b12pre.en-US.win32.installer.exe

> When I downloaded a file the first time the file was ok but every time after
> that the file was corrupt on the Windows 7 share.

Proces Monitor log with next filter may help problem analysis.
> Filter:
>  Process Name, is, firefox.exe, Include
>  Path, contains, \Temp, Include
>  Path, contains, \firefox-4, Include (if download test of firefox-4.0....zip)
Process Monitor.
> http://technet.microsoft.com/en-us/sysinternals/bb545027.aspx
> http://technet.microsoft.com/en-us/sysinternals/bb896645
Comment 90 WADA 2011-02-06 18:52:24 PST
FYI.
Following is about:cache, Disk Cache entry, for a zip file at ftp.mozilla.org, after download twice. 
> Key: http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-trunk/firefox-4.0b12pre.en-US.win32.zip
> Data size: 14853388 bytes
> Fetch count: 2
> Last modified: 2011-02-07 11:38:29
>       Expires: 2011-02-07 12:52:21
Because cache life time is not ZERO, as seen in "Fetch count: 2", second and later download doesn't request HTTP GET and data is obtained from Disk Cache.
It may affect on Firefox's behaviour in file I/O for download.
Comment 91 WADA 2011-02-06 20:19:28 PST
Created attachment 510192 [details]
Process Monitor log with tryserver build, download to local drive

(0) Checked using next tryserver build.
> https://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/sdwilsh@shawnwilsher.com-9664bb328e5b/tryserver-win32/firefox-4.0b12pre.en-US.win32.zip
    Always save to C:\wada\DOWNLOAD\@@@\@@@\@@@-Y (Same drive as \Temp).
    "Always ask where to save" is unchecked. 
    Cache is cleared(HTTP GET is issued upon download).
    Download of next zip file.
> http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-trunk/firefox-4.0b12pre.en-US.win32.zip
(1) Download is requested, and download starts.
    Write open(not append) of C:\...\Temp\EZs_V5s0.zip.part
      67163 00:01:32.5806637 12:30:38.5494793 firefox.exe
      C:\Documents and Settings\wada\Local Settings\Temp\EZs_V5s0.zip.part
      IRP_MJ_CREATE SUCCESS Desired Access:
        Generic Write, Read Attributes, Disposition: OpenIf, Options:
        Synchronous IO Non-Alert, Non-Directory File, Attributes: n/a,
        ShareMode: Read, Write, AllocationSize: 0, OpenResult: Opened
    File write of each 32KB, with offset specified
(2) File picker dialog is opened(C:\...\\@@@-Y is preselected)
    Reply "OK"
(3) C:\...\@@@-Y\firefox-4.0b12pre.en-US.win32.zip.part	is created.
    As read data for C:\...\Temp\EZs_V5s0.zip.part is not seen in log,
    and as target drive==same drive as \Temp drive,
    I think "move C:\...\Temp\EZs_V5s0.zip.part to \...\@@@-Y\f....zip.part"
    is requested by Firefox when download target directry is determined.
(4) C:\...\@@@-Y\firefox-4.0b12pre.en-US.win32.zip.part	is opened with Append.
    68164 00:01:34.8570790 12:30:40.8258946 firefox.exe
    C:\wada\DOWNLOAD\@@@\@@@\@@@-Y\firefox-4.0b12pre.en-US.win32.zip.part
    IRP_MJ_CREATE SUCCESS Desired Access:
       Append Data/Add Subdirectory/Create Pipe Instance, Write EA,
       Read Attributes, Write Attributes, Read Control, Synchronize,
       Disposition: Open, Options: Synchronous IO Non-Alert, Non-Directory
       File, Attributes: n/a, ShareMode: Read, Write, AllocationSize: n/a,
       OpenResult: Opened
(5) Data is written to C:\...\@@@-Y\firefox-4.0b12pre.en-US.win32.zip.part
    with offset = -1 (append mode write)
    69216 00:01:35.0002855 12:30:40.9691011 firefox.exe
    C:\wada\DOWNLOAD\@@@\@@@\@@@-Y\firefox-4.0b12pre.en-US.win32.zip.part
    Write IRP_MJ_WRITE	SUCCESS	Offset: -1, Length: 32,768

If problem occurs with SMB2, I guess it's upon switch from \Temp to target drive(if SMB2 is used, different drive from \Temp directory).
Append mode open followed by "write with offset=-1" in above step (4) requires SetFilePointer to EOF by user after append mode open?
Or, if SMB2, offset=ZERO is set by Win when "SetFilePointer to EOF by Fx" is requested even when "Append mode open"?
Comment 92 Wan-Teh Chang 2011-02-07 15:46:03 PST
(In reply to comment #88)

Steve,

Can you apply my "Proposed patch v2" to your source tree and
test your own build?  You are perhaps the only person who can
reproduce the bug and do a Firefox 4 build, so you're in the
best position to produce the final fix.  I can only study MSDN
and speculate potential fixes.  Thanks.
Comment 93 Wan-Teh Chang 2011-02-08 21:08:09 PST
I confirmed with an experiment that the "FILE_APPEND_DATA without
FILE_WRITE_DATA" solution causes the Win32 function LockFile and
SetEndOfFile to fail with ERROR_ACCESS_DENIED (5).

However, this limitation may not matter to Mozilla.  I verified
that Mozilla doesn't call LockFile or LockFileEx.  I also suspect
(but haven't verified) that Mozilla doesn't call SetEndOfFile on
a file opened with PR_APPEND.
Comment 94 WADA 2011-02-08 22:31:53 PST
(In reply to comment #93)
> I confirmed with an experiment that the "FILE_APPEND_DATA without
> FILE_WRITE_DATA" solution causes the Win32 function LockFile and
> SetEndOfFile to fail with ERROR_ACCESS_DENIED (5).

SetEndOfFile requests GENERIC_WRITE, so FILE_WRITE_DATA flag is mandatory.
> http://msdn.microsoft.com/en-us/library/aa365531(v=VS.85).aspx
> SetEndOfFile Function
> The file handle must be created with the GENERIC_WRITE access right.
I don't think truncate is mandatory for "append mode or PR_APPEND of Mozilla on Win".

On the other hand, FILE_WRITE_DATA flag is not mandatory for LockFile, and GENERIC_READ is sufficient.
> http://msdn.microsoft.com/en-us/library/aa365202(v=VS.85).aspx
> LockFile Function
> The file handle must have been created with the GENERIC_READ or GENERIC_WRITE access right.
If LockFile is required for Mozilla's PR_APPEND, I think "GENERIC_READ + GENERIC_WRITE - FILE_WRITE_DATA" can be used for PR_APPEND unless PR_APPEND forces "write only, no read".

As next is seen in Process Monitor log after append mode open,
> IRP_MJ_WRITE SUCCESS Offset: -1, Length: 32,768
I guess Win's "append mode" is next;
  If opened with FILE_APPEND_DATA flag but without FILE_WRITE_DATA flag,
  force FILE_WRITE_TO_END_OF_FILE internally in any WriteFile request.
I think most natural and compatible implementation of PR_APPEND on Win is;
  Open with "GENERIC_READ + GENERIC_WRITE" as current,
  and use FILE_WRITE_TO_END_OF_FILE always in WriteFile,
and, most natural implementation of "open write with append mode" on Win is;
  Open with FILE_APPEND_DATA but without FILE_WRITE_DATA.
Comment 95 Daniel Veditz [:dveditz] 2011-02-09 10:52:07 PST
Not going to "block" the next 3.6 update on this bug, but "wanted" when we have a tried-and-tested (on trunk) fix.
Comment 96 Steve Harper 2011-02-09 18:02:04 PST
Ok, I have found the problem and developed a small patch for nsExternalHelperAppService.cpp. This is a Windows 7 bug and we do not need to change NSPR. I could not reproduce the issue as I thought it was a problem with file IO. SetFilePointer returning zero when opening a file for write append is only a symptom.

I will post the patch tomorrow after work. Hopefully we can get this on a test server for people to try and be rid of this annoying issue once and for all.

Thanks,
Steve
Comment 97 Steve Harper 2011-02-10 13:33:06 PST
Created attachment 511498 [details] [diff] [review]
Patch to fix bug in Windows 7 when copying a file to a share using SMBV2

After moving a file from a Windows 7 machine to a Windows 7 mapped drive SetFilePointer randomly returns 0 for the end of file. We have to wait approximately 10 to 15 seconds after the MoveFile call before appending any data to the remote file.

I managed to locate this bug after examining the code in nsWebBrowserPersist.cpp. Downloads that use nsWebBrowserPersist always download correctly even with append mode set. This is because nsBrowserPersist does not Move the file from $temp to the mapped drive it creates it directly at the location specified. I have tested this patch by downloading 25 files of varying size on firefox and Internet explorer and then performing a binary compare.
Comment 98 WADA 2011-02-10 18:30:43 PST
(In reply to comment #97)

IIRC, "Move the file from $temp to save target directory(same drive, different local drive, or mapped drive)" was done on purpose by request like next.
  Don't continue download at \Temp, once target drive/directory is determined.
When target is different local drive or mapped drive from drive of \Temp, whole downloaded data should be copied from \Temp(usually on C:) to target drive after completion of download to \Temp, if whole data is downloaded to \Temp. "copy partial data in \Temp to target, and continue download at target drive" is to reduce data size which is needed to copy from \Temp to target drive. 

Will this "switch to target drive" feature be killed by your patch?

> After moving a file from a Windows 7 machine to a Windows 7 mapped drive
> SetFilePointer randomly returns 0 for the end of file. We have to wait
> approximately 10 to 15 seconds after the MoveFile call before appending any
> data to the remote file.

It's spec of SMB2's caching, isn't it. If so, I guess MS says that "non atomic"(probably different file handle by different task. i.e. multiple SMB2 clients) of Mozilla is root cause of this bug when SMB2 is used.
Steve Harper, can you check with xxxCacheLifeTime=0? (to report bug to MS if Win's bug)
Comment 99 WADA 2011-02-11 01:24:28 PST
FYI.

Following is for "deleted file can not be re-defined/re-used for 30 seconds".
> http://support.microsoft.com/kb/322864
> Article ID: 322864 - Last Review: October 31, 2006 - Revision: 4.1
> Deleted files are not immediately removed and cannot be overwritten

Following is a document relates to SMB2(by MS people).
> http://www.snia.org/events/storage-developer2010/presentations/wednesday/MatthewGeorge-DavidKruse-Analyzing_Metadata_SMB2-v2.pdf
> Analyzing Metadata Caching in the Windows SMB2 Client 
Next is seen in the document.
> Key Takeaways
> Be aware that there will always be applications that require very strict metadata consistency gurantee
> and "best effort" caching may not work in those cases.

Software who tries to "Delete a file and re-use same file name immediately" or "Copy a file to target file, and append data to the target file immediately after copy" seems an "applications that require very strict metadata consistency gurantee" :-)
Comment 100 WADA 2011-02-12 21:50:42 PST
FYI.
Following is a document for "MSI package provided by Alaska Software for his customers", which sets ZERO in FileInfoCacheLifetime/FileNotFoundCacheLifetime/DirectoryCacheLifetime, in order to protect his customers from file corruption produced by MS's decision on SMB2's caching that cache coherency of file meta information is not kept always.
> http://www.alaska-software.com/fixes/smb2/overview.shtm
> Details
> With the advent of Windows Vista and Windows 7, Microsoft introduced a new
> network protocol (SMB2) to optimize file sharing for WAN and low bandwidth
> and high latency scenarios. To optimize these types of file access scenarios,
> Microsoft performed design decisions which lead to the inability of the new
> SMB2 protocol to handle cache coherency of file meta information such as
> the file size, the last update time and whether the file actually exists on
> the server ("file not found" status).
> As a result of this design decision made by Microsoft, the SMB2 protocol with
> its default configuration breaks any application relying on shared,
> concurrent data access.
> It is therefore absolutely required to reconfigure the SMB2 cache of the
> local workstation to not cache file meta information.
> Alaska Software provides to its customers and their end-users an MSI
> installation package which reconfigures the SMB2 cache accordingly.
> This MSI package needs to be executed on any Vista and Windows 7 workstation
> in a network to ensure that no data loss or data corruption occurs when
> accessing files concurrently. 
> This package creates and modifies values under following registry key:
>   * HKEY_LOCAL_MACHINE\system\CurrentControlSet\Services\LanmanWorkstation\Parameters
> The following registry values are created and set to zero:
>   * FileInfoCacheLifetime
>   * FileNotFoundCacheLifetime
>   * DirectoryCacheLifetime
Comment 101 Steve Harper 2011-02-13 05:02:23 PST
Created attachment 512024 [details] [diff] [review]
A patch to ensure all copy opperations performed in nsLocalFileWin use unbuffered I/O.

WADA thanks for the post, when the Helper App dialog is displayed the user can choose to open with an application or Save to disk. The helper app also handles the case where the user right clicks on a link and selects 'Save Link As'.

When the user clicks on a link the file immediately begins downloading to $temp. If the user clicks the 'Save File' button on the Helper App dialog the SaveToDisk method in nsExternalHelperAppService is called. This code closes the stream and moves whatever has already been downloaded to a operating system specific home directory. In windows this will be 'Downloads' on Linux this will probly be 'Home'. 

If the user right clicks on a link and selects 'Save Link As' the SaveToDisk method is called but this time nothing is download to temp and the empty temp file is moved to the user specified directory with the new file name.

My previous patch would only work in the second case where the user selects 'Save Link As'. In the the first case where the user clicks on the link and data is downloaded to $temp the data downloaded before the user clicked 'Save File' would be lost resulting in a corrupt file. This happens because I removed the MoveFile function call. What we need to do is move the temp file in $temp to the target directory so that it can be opened again for write in append mode, and the download can continue.

The new patch I have submitted changes nsiLocalFileWin.cpp in XPCOM. I think this is the best place for the change as any Moves/Copies to a Windows 7 share will now be fixed.

The patch works in the following way:

We need to pass the flag COPY_FILE_NO_BUFFERING to CopyFileExW calls. Previously MoveFileExW was been called with the flag MOVEFILE_COPY_ALLOWED which allowed a file to be moved to a different disk or network share. I have changed the logic in the CopySingleFile method to work the following way:

If the major version of the operating system is > 5 (Anything above Windows XP, Windows Server 2003, Windows 2000) and we need to perform a copy we set the COPY_FILE_NO_BUFFERING flag and pass it to CopyFileExW. If we are performing a move we try to Move the file to the new location without passing MOVEFILE_COPY_ALLOWED to MoveFileExW. If this fails and the error is ERROR_NOT_SAME_DEVICE we try to copy the file calling CopyFileExW with the COPY_FILE_NO_BUFFERING if this fails then we return the normal error as before.

This bug in the SMBV2 re director is only ever going to be an issue if both Windows machines support it. So we only need to pass the extra COPY_FILE_NO_BUFFERING flag if the major version of windows on the client machine is greater than 5 (Anything above Windows XP, Windows Server 2003 and Windows 2000). If one of the machines does not support SMBV2 then SMBV1 is used.

The COPY_FILE_NO_BUFFERING flag also has the side affect of increasing the performance of large file copies(According to MSDN).

Wan-Teh Chang, is this patch acceptable to you? Maybe someone who works on XPCOM should take a look at this patch.

Thanks,
Steve
Comment 102 Steve Harper 2011-02-13 05:08:22 PST
WADA the above registry keys may well fix the issue but we need a code based solution and altering the cache sizes will have performance implications when using a shared drive in general use.
Comment 103 WADA 2011-02-13 18:00:00 PST
(In reply to comment #102)
> we need a code based solution

I agree with you. This bug is similar case to bug 307527. In that bug, Win's bug with ACPI on multi-CPU could surely be resolved by /usepmtimer, but the solution is hard for general users and the solution was impossible with some Windows versions. Hotfix by MS was not provided for all users when the problem was found. So, code based solution was implemented by Wan-Teh Chang.

> altering the cache sizes will have performance implications when using a shared drive in general use.

xxxCacheLifetime=0 doesn't alter cache size for read/write of data in file. Those three registry setting is for retention period of "file meta data", i.e. for retention period of "Leasing Key"(referred as Directory Leasing Key in pdf file I pointed). xxxCacheLifetime=0 increases network traffics between an SMB2 client and SMB2 server only for "every update notification of file meta data to server" and "every inquiry of file meta data to server". It increases SMB2 server's workload too, but it's not far different from SMB1. It's similar to SMB1.
In this bug's case, update of a file is executed on one PC only, and concurrent file access to the file by other PC can be ignored. I think this bug is problem when concurrent access of a file from multiple file handles at single PC.
  File meta data updated by file-handle-1 is not immediately reflected
  to file meta data which file-handle-2 has, due to SMB2's caching design.    
Is buffered write or non buffered write relevant to this bug's problem?

FYI.

FileInfoCacheLifetime=0 and DirectoryCacheLifetime=0 is official solution by MS.

Following is another SMB2 relevant pdf.
> http://www.caseware.com/downloads/pdf/WorkingPapers/Client%20and%20Host%20Operating%20System%20Combinations%20and%20SMB2%20Recommendations.pdf
This pdf document points next MS KB 2028965 and KB 2461645.

> http://support.microsoft.com/kb/2028965
> Article ID: 2028965 - Last Review: August 20, 2010 - Revision: 4.0
> Data corruption when multiple users perform read and write operations
> to a shared file in the SMB2 environment
> Hotfix download is available
>
> Workaround
>  HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\LanmanWorkstation\Parameters
>  FileInfoCacheLifeTime = 0
>   Notes
>    Type 0 to disable the time-out threshold for the local cache update.
>    The number that you type indicates the length in second of the time-out threshold for local cache update.
I think "local cache" in this context is not cache for read/write of data in the file.

> http://support.microsoft.com/kb/2461645
> Article ID: 2461645 - Last Review: November 5, 2010 - Revision: 1.0
> Mounted folders disappear in shared folders after navigating to them
> in Windows 7, Windows Vista, Windows Server 2008 or Windows Server 2008 R2
>
> Workaround
> HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\LanmanWorkstation\Parameters
>   DirectoryCacheLifetime = 0
Comment 104 WADA 2011-02-13 18:21:47 PST
Just an idea.
I think change like next can be a code based change to bypass Win's bug for this bug, because it's same solution as MS's hotfix.
  New prefs entries:
    general.bypass_MS_Win_bugs.SMB2.FileInfoCacheLifetime
    general.bypass_MS_Win_bugs.SMB2.FileNotFoundCacheLifetime
    general.bypass_MS_Win_bugs.SMB2.DirectoryCacheLifetime
  If true, and if not-extists or not ZERO, set ZERO to correnponding Win's
  registry key, and ask user to re-boot his Win if required.
If there is a way to know that SMB2 is used by both server and client, setting ZERO is better limited in SMB2 environment only.
Problem of this kind of solution;
  prefs.js will surelly be filled by bypass_MS_Win_bugs entries :-)
Comment 105 WADA 2011-02-13 20:12:44 PST
FYI.
(A) To know a drive on Win is network resource or not, APIs like next can be used.
> http://msdn.microsoft.com/en-us/library/aa370647(v=VS.85).aspx
> NetUseEnum Function
> http://msdn.microsoft.com/en-us/library/aa370648(v=VS.85).aspx
> NetUseGetInfo Function
> http://msdn.microsoft.com/en-us/library/aa385453(v=VS.85).aspx
> WNetGetConnection Function
I don't know whethr SMB version can be known via such API or not.
(B) Implimentation of SMB2 client on Linux looks in progress, seems tough work though :-) 
> http://www.snia.org/events/storage-developer2010/agenda2010/abstracts#ms_fsa
> http://www.snia.org/events/storage-developer2010/agenda2010/abstracts#winding_smb_cifs
> http://www.snia.org/events/storage-developer2010/agenda2010/abstracts#imp_smb2
Developers of SMB2 client on Linux looks already knows well about SMB2's problem of inconsistency of file meta data.
> http://www.snia.org/events/storage-developer2010/agenda2010/abstracts#meta_prob
> Storage Developer Conference 2010
I hope SMB2's file meta data caching is disabled by default on Linux.
(C) IBM's recommendation to "GPFS for Windows" user was to disable SMB2 at server side.
> http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=%2Fcom.ibm.cluster.gpfs.doc%2Fgpfs_faqs%2Fgpfsclustersfaq.html
> A workaround for this limitation is to disable the SMB2 protocol on the server.
In this document, reason why "disable of SMB2 is needed" is stated as that GPFS for Windows does not support a file system feature called Directory Change Notification. However, even if Directory Change Notification is supported, it's probably useless for normal applications like Mozilla, because Directory Change Notification of SMB2 is perhaps invoked after DirectoryCacheLifetime is expired at where the change occurs. It's usually too late for normal applications.

Steve Harper, do you think that we need to report problem of SMB2 found by this bug to MS?
Comment 106 Steve Harper 2011-02-14 09:34:58 PST
Can we get a code review for this patch?

Thanks
Comment 107 Shawn Wilsher :sdwilsh 2011-02-14 09:42:59 PST
(In reply to comment #106)
> Can we get a code review for this patch?
You should request it from jmathies and rstrong.
Comment 108 Nelson Bolyard (seldom reads bugmail) 2011-02-14 11:16:10 PST
Since this workaround for the Windows bug is not being implemented in NSPR,
please change this bug to the relevant Product and Component.
Comment 109 Shawn Wilsher :sdwilsh 2011-02-14 11:37:01 PST
(In reply to comment #108)
> Since this workaround for the Windows bug is not being implemented in NSPR,
> please change this bug to the relevant Product and Component.
Should we not take the NSPR patch at all?
Comment 110 Steve Harper 2011-02-14 13:02:43 PST
I can still consistently reproduce the issue with the NSPR patch. I think this patch tries to deal with the side affect of cached file metadata not correctly reflecting the actual bytes written. This appears to happen when we perform a normal buffered copy. Can we upload the XPCOM patch to the test server for people to try so we can confirm it resolves the issue for them? If not i am happy to post a link to an installer with this patch applied.

Thanks
Comment 111 Shawn Wilsher :sdwilsh 2011-02-14 13:35:56 PST
(In reply to comment #110)
> I can still consistently reproduce the issue with the NSPR patch. I think this
> patch tries to deal with the side affect of cached file metadata not correctly
> reflecting the actual bytes written. This appears to happen when we perform a
> normal buffered copy. Can we upload the XPCOM patch to the test server for
> people to try so we can confirm it resolves the issue for them? If not i am
> happy to post a link to an installer with this patch applied.
I would, but the patch doesn't seem to apply cleanly to mozilla-central.
Comment 112 Jim Mathies [:jimm] 2011-02-14 14:02:22 PST
(In reply to comment #111)
> (In reply to comment #110)
> > I can still consistently reproduce the issue with the NSPR patch. I think this
> > patch tries to deal with the side affect of cached file metadata not correctly
> > reflecting the actual bytes written. This appears to happen when we perform a
> > normal buffered copy. Can we upload the XPCOM patch to the test server for
> > people to try so we can confirm it resolves the issue for them? If not i am
> > happy to post a link to an installer with this patch applied.
> I would, but the patch doesn't seem to apply cleanly to mozilla-central.

$ patch -p1 < ../reviewpatches/patch.diff
(Stripping trailing CRs from patch.)
patching file xpcom/io/nsLocalFileWin.cpp

Applied it here. I can fire off a try build.
Comment 113 Jim Mathies [:jimm] 2011-02-14 14:06:44 PST
http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/jmathies@mozilla.com-a514f759f30a

Should show up in about six hours.
Comment 114 Steve Harper 2011-02-14 14:07:06 PST
Created attachment 512277 [details] [diff] [review]
A patch to ensure all copy opperations performed in nsLocalFileWin use unbuffered I/O.

This patch has the unix style newline characters.

Hope this is ok,

Thanks,
Steve
Comment 115 Wan-Teh Chang 2011-02-14 15:20:05 PST
Comment on attachment 512277 [details] [diff] [review]
A patch to ensure all copy opperations performed in nsLocalFileWin use unbuffered I/O.

Please add comments about the use of COPY_FILE_NO_BUFFERING
for Windows major version > 5.

Why do you need to define COPY_FILE_NO_BUFFERING?  Would be
nice to explain why with a comment.

>+        if (GetLastError() == ERROR_NOT_SAME_DEVICE)
>+        {
>+            copyOK = CopyFileExW(filePath.get(), destPath.get(), NULL, NULL, NULL, dwCopyFlags);
>+            
>+            if (copyOK)
>+                DeleteFile(filePath.get());
>+        }

This block of code needs to follow the ::MoveFileExW call
above, inside the same 'else' block.  Otherwise it will
miss the
    else if (move) // Set security permissions to inherit from parent.
line below on a successful simulated file move across volumes/devices.
Comment 116 WADA 2011-02-14 18:41:43 PST
Following is log lines for IRP_MJ_CLEANUP and IRP_MJ_CLOSE in my Process Monitor log. 
> "C:\Documents and Settings\wada\Local Settings\Temp\EZs_V5s0.zip","","IRP_MJ_CLEANUP","SUCCESS",""
> "C:\DOCUME~1\wada\LOCALS~1\Temp\EZs_V5s0.zip","","IRP_MJ_CLOSE","SUCCESS",""
Why short filename is used for long directry name part when IRP_MJ_CLOSE?

Steve Harper, is short filename used for directory part(and file name part) by Firefox upon CreateFile? If yes, it may affect on SMB2's file meta data caching even when concurrent file access by multi-threads/multi-file-handles at single MS Windows on single PC. It may produce different "file lease key" of SMB2 which is generated by an SMB2 client, and this may expose bug of Win.
  CopyFile by long filename           => file lease key #1
  Append Write open by short filename => file lease key #2
  Note: "Different lease keys for single file at server" is normal status,
        when file-B is link to file-A at server and file-A & file-B is opened.
Comment 117 Nelson Bolyard (seldom reads bugmail) 2011-02-14 19:04:10 PST
(In reply to comment #109)
> Should we not take the NSPR patch at all?

What NSPR patch??  
The non-obsolete patches attached to this bug all patch code in xpcom, not NSPR.
Comment 118 Wan-Teh Chang 2011-02-14 19:59:07 PST
Comment on attachment 510033 [details] [diff] [review]
Proposed patch v2

Nelson, this is the "NSPR patch" that sdwilsh referred to.
This patch modifies some NSPR code copied to XPCOM.

I think we should check in this patch (on the Mozilla
trunk only) to eliminate the dependency on the private
NSPR types.
Comment 119 Steve Harper 2011-02-15 13:17:59 PST
Created attachment 512581 [details] [diff] [review]
A patch to ensure all copy opperations performed in nsLocalFileWin use unbuffered I/O.

Wan-Teh Chang Thanks very much for your feedback I have created a new patch with the changes you suggested. I hope this one is accceptable.

WADA: I use Windows 7 and the temp file generated by firefox does not use long file/directory names.

Thanks
Comment 120 Steve Harper 2011-02-15 13:28:51 PST
Created attachment 512586 [details] [diff] [review]
A patch to ensure all copy opperations performed in nsLocalFileWin use unbuffered I/O.

Moved a comment inline with rest of code
Comment 121 Jim Mathies [:jimm] 2011-02-15 14:05:58 PST
Comment on attachment 512586 [details] [diff] [review]
A patch to ensure all copy opperations performed in nsLocalFileWin use unbuffered I/O.


>+ /* Bug 545650 - Pass the flag COPY_FILE_NO_BUFFERING to CopyFileEx when copying
>+    to a SMBV2 remote drive. Without this parameter subsequent append mode file 
>+    writes can cause the resultant file to become corrupt. We only need to do 
>+    this if the major version of Windows is > 5(Only Windows Vista and above can 
>+    support SMBV2). */
>     int copyOK;

Please nix the 'Bug XYZ' note, hg blame takes care of that. Also I'd prefer '//' style commenting here.

This states we use COPY_FILE_NO_BUFFERING only for SMBV2 remote drives, but that's not the case is it?

>-
>+    DWORD dwVersion = GetVersion();
>+    DWORD dwMajorVersion = (DWORD)(LOBYTE(LOWORD(dwVersion)));
>+    DWORD dwCopyFlags = 0;
>+    
>+    if (dwMajorVersion > 5)
>+       dwCopyFlags = COPY_FILE_NO_BUFFERING;
>+    
>     if (!move)
>-        copyOK = ::CopyFileW(filePath.get(), destPath.get(), PR_TRUE);
>+        copyOK = ::CopyFileExW(filePath.get(), destPath.get(), NULL, NULL, NULL, dwCopyFlags);


Is this necessary in all cases? MSDN states you should use this for large file transfers. I don't think we do a lot of file copying / moving on startup.. but we should keep an eye on talos runs when this lands.


>     else {
> #ifndef WINCE
>         DWORD status;
>         if (FileEncryptionStatusW(filePath.get(), &status)
>             && status == FILE_IS_ENCRYPTED)
>         {
>-            copyOK = CopyFileExW(filePath.get(), destPath.get(), NULL, NULL, NULL,
>-                                 COPY_FILE_ALLOW_DECRYPTED_DESTINATION);
>+            dwCopyFlags |= COPY_FILE_ALLOW_DECRYPTED_DESTINATION;
>+            copyOK = CopyFileExW(filePath.get(), destPath.get(), NULL, NULL, NULL, dwCopyFlags);
> 
>             if (copyOK)
>                 DeleteFileW(filePath.get());
>         }
>         else
>         {
>             copyOK = ::MoveFileExW(filePath.get(), destPath.get(),
>                                    MOVEFILE_REPLACE_EXISTING |
>-                                   MOVEFILE_COPY_ALLOWED |
>                                    MOVEFILE_WRITE_THROUGH);
>+                                   
>+            if (!copyOK && GetLastError() == ERROR_NOT_SAME_DEVICE) // Check if copying the source file to a different volume. This could be an SMBV2 mapped drive.

Nit - move that comment up and limit it to 80 chars per line.

>+            {
>+                copyOK = CopyFileExW(filePath.get(), destPath.get(), NULL, NULL, NULL, dwCopyFlags);
>+            
>+                if (copyOK)
>+                    DeleteFile(filePath.get());
>+                else
>+                    ConvertWinError(GetLastError());

The only thing ConvertWinError does is return a value. You're grabbing it below on if (!copyOK). Looks like this can be snipped.
Comment 122 Steve Harper 2011-02-15 15:03:50 PST
Created attachment 512609 [details] [diff] [review]
A patch to ensure all copy opperations performed in nsLocalFileWin use unbuffered I/O.

Jim Mathies thanks for taking the time to review this patch. I have submitted an updated one with the feedback you gave.

From my testing I dont think using the COPY_FILE_NO_BUFFERING flag has a detrimental affect when copying small files, its just not supported in Windows 2000/2003/XP. It does however appear to resolve this Windows 7 bug in all cases.

Thanks
Comment 123 Jim Mathies [:jimm] 2011-02-15 15:43:33 PST
Comment on attachment 512609 [details] [diff] [review]
A patch to ensure all copy opperations performed in nsLocalFileWin use unbuffered I/O.

Pushed to try for a set of test runs.
Comment 124 Jim Mathies [:jimm] 2011-02-16 05:50:33 PST
(In reply to comment #123)
> Comment on attachment 512609 [details] [diff] [review]
> A patch to ensure all copy opperations performed in nsLocalFileWin use
> unbuffered I/O.
> 
> Pushed to try for a set of test runs.

Passed tests and talos number looked ok.
Comment 125 Steve Harper 2011-02-17 12:32:37 PST
Can we push this fix into trunk?
Comment 126 Shawn Wilsher :sdwilsh 2011-02-17 12:40:01 PST
It would need approval, although I'm not sure it'd get it given the current state of trying to get Firefox 4 out the door.
Comment 127 Jim Mathies [:jimm] 2011-02-17 12:57:35 PST
Unfortunately it was already marked non-blocking. The good news is we're starting a new rapid release cycle after 4.0. I've got this stored in my "land for the first quick turn around" (5.0) release. That should be in March, with a release shortly after.
Comment 128 Patrick Donker 2011-02-25 01:15:31 PST
So this fix wont be in 4 final??
Comment 129 Robert Strong [:rstrong] (use needinfo to contact me) 2011-02-25 01:17:13 PST
(In reply to comment #128)
> So this fix wont be in 4 final??
Correct.

Per the previous comment
(In reply to comment #127)
> Unfortunately it was already marked non-blocking. The good news is we're
> starting a new rapid release cycle after 4.0. I've got this stored in my "land
> for the first quick turn around" (5.0) release. That should be in March, with a
> release shortly after.
Comment 130 Patrick Donker 2011-02-25 01:20:21 PST
Rather shocking that a new version of FF will be released that will corrupt downloads when saved a network share. Great move guys, Redmond will be thrilled :s
Comment 131 Shawn Wilsher :sdwilsh 2011-02-25 07:56:49 PST
(In reply to comment #130)
Thanks for providing an incredibly useful comment to move the bug forward.  Your input is greatly appreciated.
Comment 132 Patrick Donker 2011-02-25 12:23:22 PST
Glad you are happy with a browser who corrupts your downloads. But I'm sure you can explain why the team comes to such a brilliant decision as of why that doesnt matter
Comment 133 Jim Mathies [:jimm] 2011-02-25 12:42:31 PST
(In reply to comment #132)
> Glad you are happy with a browser who corrupts your downloads. But I'm sure you
> can explain why the team comes to such a brilliant decision as of why that
> doesnt matter

Due to timing we can't land the fix this close to a release. The code applies to an area of the code base that is used for file access, surely you can see the risk of potential regressions here. 

Our next release after 4.0 should be sometime this summer, so assuming there are no regressions the bug will be fixed by then. You'll be able to use a version of Fx before that however as the patch will land on mozilla central and be in nightly builds and future betas as soon as the tree opens after we branch. This should happen within the next month or so. In the mean time there are simple work arounds to the problem you can use.
Comment 134 Patrick Donker 2011-02-25 13:44:31 PST
I can totally understand the reasoning, but please also take in consideration how a new release will be received by the general public (not to mention corporate), when they find out that downloading is a no go with this version.
Especially corporate will massively abandon 4 because they are most probably be using the exact configuration (W7x64/2k8R2) that is prone to this problem, no matter how many easy and simple work arounds are available
I for one will skip 4 in my enterprise set up until I am absolutely sure that this issue is resolved.
Comment 135 Shawn Wilsher :sdwilsh 2011-02-25 13:49:48 PST
(In reply to comment #134)
> I can totally understand the reasoning, but please also take in consideration
> how a new release will be received by the general public (not to mention
> corporate), when they find out that downloading is a no go with this version.
> Especially corporate will massively abandon 4 because they are most probably be
> using the exact configuration (W7x64/2k8R2) that is prone to this problem, no
> matter how many easy and simple work arounds are available
> I for one will skip 4 in my enterprise set up until I am absolutely sure that
> this issue is resolved.
This appears to hit all versions of Firefox, so 4.0 is no different than 3.6 or 3.5.
Comment 136 Patrick Donker 2011-02-25 13:51:49 PST
That is because this bug was never picked up and assigned to anybody
Comment 137 Drew 2011-02-25 13:52:22 PST
I'm wondering why it took so long for anyone else to even start paying attention to this problem, as bad as it is. I don’t think I saw one bit of conversation until the last month or so.
Comment 138 Patrick Donker 2011-02-25 13:58:50 PST
In earlier versions people didnt notice it very much as the W7x64/2k8R2 combo was less used than it is these days.
Comment 139 Steve Harper 2011-02-26 06:47:42 PST
Hi Patrick im sorry to here your frustration, but the fix applies to a portion of FF that is used by many other areas of the code. As this patch was submitted late introducing it into the FF 4 code base now would be far too risky and could introduce new problems in other areas of FF that were once working.

In the meantime it would be really helpfull if you could download the test build above and ensure it fixes any issues you have been experiancing regarding saving to a Windows 7 mapped drive.(I have only tested this fix on my machine)
Comment 140 Steve Harper 2011-02-26 06:52:17 PST
Jim Mathies: Can you update the try server with the latest patch containing the changes made after your code review? This way people will be testing against the actual code that will hit trunk.
Comment 141 Attila Mesterhazy 2011-02-26 08:15:33 PST
Where can I downloaded the fixed build?
Comment 142 Steve Harper 2011-02-28 09:19:22 PST
Sorry, The test build does not appear to be available anymore.
Comment 143 Jim Mathies [:jimm] 2011-02-28 09:38:45 PST
(In reply to comment #140)
> Jim Mathies: Can you update the try server with the latest patch containing the
> changes made after your code review? This way people will be testing against
> the actual code that will hit trunk.

Sure:

http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/jmathies@mozilla.com-8db9d84db124

I fired off an optimized build, so they'll show up in about five hours.
Comment 144 Attila Mesterhazy 2011-03-03 04:41:41 PST
(In reply to comment #143)
> (In reply to comment #140)
> > Jim Mathies: Can you update the try server with the latest patch containing the
> > changes made after your code review? This way people will be testing against
> > the actual code that will hit trunk.
> 
> Sure:
> 
> http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/jmathies@mozilla.com-8db9d84db124
> 
> I fired off an optimized build, so they'll show up in about five hours.

I tested it for three days and had no problems with downloading large files directly to a mapped network drive. (Win2008R2 server, Win7 Enterprise 32 bit client)
Comment 145 Jim Mathies [:jimm] 2011-03-28 05:37:52 PDT
http://hg.mozilla.org/mozilla-central/rev/315cbee8f00c
Comment 146 KiboOst 2011-03-28 06:51:22 PDT
How can I apply the patch ? Is it for FF3.6 or 4.0 ? Or should we wait for a 4.01 ?
Comment 147 Jim Mathies [:jimm] 2011-03-28 07:15:05 PDT
(In reply to comment #146)
> How can I apply the patch ? Is it for FF3.6 or 4.0 ? Or should we wait for a
> 4.01 ?

Right now it's only  in 5.0 (mc), but assuming there aren't issues with it I'll request approval to land it in a 4.0 point release. There are no guarantees it'll get approved for that though.
Comment 148 Shawn Wilsher :sdwilsh 2011-03-28 07:19:20 PDT
(In reply to comment #146)
> How can I apply the patch ? Is it for FF3.6 or 4.0 ? Or should we wait for a
> 4.01 ?
The patch is for Firefox 5.
Comment 149 Kevin Brosnan [:kbrosnan] 2011-03-28 07:20:16 PDT
The patch is currently checked into the trunk. http://nightly.mozilla.org will have builds today or tomorrow with the patch, I am not sure if it got in early enough for today's nightly. The first release with this patch will be Firefox 5. There is a small chance it would be taken for a 4.0.x release but given the release cadence we are aiming for it is unlikely.
Comment 150 Patrick Donker 2011-05-20 15:25:18 PDT
I'm running 5b2 and am please to report the problem being resolved :)
Thanks!
Comment 151 neil@parkwaycc.co.uk 2012-10-07 16:49:40 PDT
Comment on attachment 512609 [details] [diff] [review]
A patch to ensure all copy opperations performed in nsLocalFileWin use unbuffered I/O.

>             if (copyOK)
>                 DeleteFileW(filePath.get());
>         }
>         else
>         {
>             copyOK = ::MoveFileExW(filePath.get(), destPath.get(),
>                                    MOVEFILE_REPLACE_EXISTING |
>-                                   MOVEFILE_COPY_ALLOWED |
>                                    MOVEFILE_WRITE_THROUGH);
>+            
>+            // Check if copying the source file to a different volume,
>+            // as this could be an SMBV2 mapped drive.
>+            if (!copyOK && GetLastError() == ERROR_NOT_SAME_DEVICE)
>+            {
>+                copyOK = CopyFileExW(filePath.get(), destPath.get(), NULL, NULL, NULL, dwCopyFlags);
>+            
>+                if (copyOK)
>+                    DeleteFile(filePath.get());
[Mozilla style used to be to explicitly use the W-suffixed versions, as above.]

Note You need to log in before you can comment on or make changes to this bug.