Closed Bug 727551 Opened 13 years ago Closed 10 years ago

Create a Windows-native alternate to msys rm.exe to avoid common problems deleting files

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task, P3)

x86
Windows Server 2003

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jhford, Assigned: vlad)

References

Details

Attachments

(1 file, 2 obsolete files)

I wrote a tool to delete files on windows using native win32 api calls. Lets deploy this to our windows build and test slaves.
Whiteboard: [opsi]
jhford, any chance we can get this code public in Open Source so that its not a black box if something is wrong with it?
(In reply to Justin Wood (:Callek) from comment #2) > jhford, any chance we can get this code public in Open Source so that its > not a black box if something is wrong with it? https://bugzilla.mozilla.org/show_bug.cgi?id=583129#c8, http://blog.johnford.org/writting-a-native-rm-program-for-windows/
Component: Release Engineering → Release Engineering: Platform Support
OS: Mac OS X → Windows Server 2003
Priority: -- → P3
QA Contact: release → coop
Unlike the Win32 slaves, which hit rm problems maybe once a week with dozens of slaves doing jobs all day and night, the Win64 test slaves (which are only visible on https://tbpl.mozilla.org/?tree=Jetpack, but are also on https://tbpl.mozilla.org/?noignore=1) hit rm problems more like once every push, so if we're going to actually get serious about Win64 this blocks it.
Assignee: nobody → coop
I think the easiest way to do this is to drop winrm.exe into the existing path renamed as rm.exe such that the new rm.exe is used instead of the stock msys one. Casual inspection indicates that dropping winrm.exe into the mozilla-build/ root should allow this approach to work on build machines. On test machines, it might be safer to rename the existing msys rm.exe and replace it with this new one. Maybe we do this everywhere just to be consistent. Replacing the existing rm.exe would allow us to avoid having to do platform detection in the buildbot code everywhere where we want to remove files, i.e. any existing |rm -rf| should just work, with the added benefit of working for any file removals happening as part of the build process itself. Is this a workable deployment strategy or am I missing something here?
Status: NEW → ASSIGNED
Priority: P3 → P2
Sounds workable. Perhaps we want to file a bug for MozillaBuild to take this change too? BTW did we completely discarded just using rmdir? or this is in general so whenever rm is called *anywhere* we use jhford's version?
(In reply to Armen Zambrano G. [:armenzg] - Release Engineer from comment #8) > BTW did we completely discarded just using rmdir? or this is in general so > whenever rm is called *anywhere* we use jhford's version? I think we did discount the idea of using rmdir. It would add (more) special case code for Windows, whereas jhford's winrm is a drop-in replacement for rm.
I've started deployment this morning. So far I've done the t-r3-w764 slaves, and will update my progress as I go.
Summary: deploy winrm.exe to windows build slaves → deploy winrm.exe to windows slaves
Decided explicitly *not* to deploy this to OPSI, but will make sure that it makes it into the old ref images (just in case) and makes it into GPO and the current pre-GPO win64 imaging process.
Whiteboard: [opsi]
Depends on: 765368
I've reverted back to the msys version of rm.exe everywhere. I took "Let's deploy this" in comment #0 and comment #1 to mean we were ready to deploy this. Guess not. :/ jhford: Given https://bugzilla.mozilla.org/show_bug.cgi?id=765368#c3 , what systems was this actually tested on?
Coop asked me to comment about the scope of this work, as someone who knows C++ I suspect 1-3 hours, give or take if I was to do it, since I know C++ on-windows well, but not these File Manip APIs, so would spend the majority of my time learning how this code is written and MSDN pages about these APIs. If John/someone-else-who-knows-these-apis-well, went to try this, could probably cut that time in half or more. If we wanted to just *blindly* make the failure non-fatal by removing return FALSE; here, we could probably turnaround on a binary in 30 minutes or less for testing. *whatever* choice we do, we need to stage this (imo on all win OS types we use, including 32-bit w2k3 builders)
(In reply to Chris Cooper [:coop] from comment #12) > I've reverted back to the msys version of rm.exe everywhere. I took "Let's > deploy this" in comment #0 and comment #1 to mean we were ready to deploy > this. Guess not. :/ > > jhford: Given https://bugzilla.mozilla.org/show_bug.cgi?id=765368#c3 , what > systems was this actually tested on? I tested it as a standalone application a few different systems. I never expected that it'd overwrite the msys rm program. Was this seen while it ran on staging? Justin, I don't have access to a windows build environment right now, but here is a patch that should remove the Wow64 stuff completely. The potential fallout is that if the program runs as a 32bit binary on a 64bit system, it will be unable to delete files that Wow64 provides (should be limited to system32, and the likes). I don't have an exhaustive list of files that are wrapped by wow64, but my guess is that they'd all be system libraries which i assume shouldn't be deleted in CI processes! http://hg.mozilla.org/users/jford_mozilla.com/jhford-native-rm/rev/84a240176b93 This should be a matter of cloning on a win2k3 slave and running cl rm.cpp
(In reply to John Ford [:jhford] from comment #14) > I don't have access to a windows build environment right now, To be clear I haven't compiled this to see if it's valid.
Depends on: 765442
(In reply to John Ford [:jhford] from comment #14) > (In reply to Chris Cooper [:coop] from comment #12) > I tested it as a standalone application a few different systems. I never > expected that it'd overwrite the msys rm program. Was this seen while it > ran on staging? "Deploy" means something different to the two of us. Changing the bug summary to avoid further confusion. For those following along at home, Callek is going to take a stab at updating the source to avoid/ignore the Wow64 problems. He'll give me something I can test in staging.
Summary: deploy winrm.exe to windows slaves → Create a Windows-native alternate to msys rm.exe to avoid common problems deleting files
To start with, Creating the .exe is easy I created this by: * Login to a preprod w2k8 builder as cltbld * Launch start-msvc10 * hg clone https://hg.mozilla.org/build/tools (where this now lives) * pushd tools/buildfarm/utils/winrm * make * <copy out winrm.exe to somewhere> * popd * rm -rf tools --- I tossed winrm on the buildvpn in my user directory, since I've yet to get myself a people account [I might as well soon] and I think I applied the proper chmods to let others get it. vpn1.dmz.releng.scl3.mozilla.com:/home/Callek/winrm.exe Any problems grabbing this; just ping me.
I've grabbed the updated file and am testing it on a win64 builder in staging right now.
Hrmmm, this version doesn't handle wildcards. Deleting individual files or directories when specified precisely works fine, but the following fails, e.g.: rm -rf obj-firefox/dist/firefox-* obj-firefox/dist/fennec* obj-firefox/dist/seamonkey* obj-firefox/dist/install/sea/*.exe Invalid file attributes for "obj-firefox/dist/install/sea/*.exe" Invalid file attributes for "obj-firefox/dist/seamonkey*" Invalid file attributes for "obj-firefox/dist/fennec*" Invalid file attributes for "obj-firefox/dist/firefox-*"
Nope, wildcards weren't specifically implemented but my understanding of the API that I used is that it supported "*" and "?" in path names [1] and would expand the wildcards as expected. I am not sure why it isn't expanding the wildcards. [1] http://msdn.microsoft.com/en-us/library/windows/desktop/aa364418%28v=vs.85%29.aspx
I'm trying a new version of the tool in staging that just ignores files with bad attributes, which I'm assuming to be non-existent files or directory hierarchies. I don't know if that's a valid assumption. Please speak up if you think it's not. In limited testing so far, the new version is successfully removing files and ignoring missing trees.
I think the problem is that del() needs to use the FindFirstFileW api on each of the paths passed in for wildcard expansion. I expected that the wildcards would have been expanded by the shell, but it might be that because it's native win32, msys doesn't munge it's argv. Ignoring the invalid attribute results in not expanding the wildcard which in turn means that nothing is deleted. A workaround would be to change all the rm commands where winrm would also be used to "ls obj-firefox/dist/firefox-* obj-firefox/dist/fennec* obj-firefox/dist/seamonkey* obj-firefox/dist/install/sea/*.exe | xargs rm -rf" that would be a no-op on linux and mac platforms. Provided that xargs and winrm play nicely, it would let msys do wildcard expansion and unix->win32 path translations.
(In reply to John Ford [:jhford] from comment #23) > Ignoring the invalid attribute results in not expanding the wildcard which > in turn means that nothing is deleted. I don't think this is the case. It's not that the tool isn't expanding the wildcard, it's that one of the subdirs in the path doesn't exist, which triggers the bad attributes error. Wildcard or no, if the path doesn't exist, there was nothing to delete in the first place. I can test this pretty trivially by hand to verify things are being deleted recursively when they should be. > A workaround would be to change all the rm commands where winrm would also > be used to > > "ls obj-firefox/dist/firefox-* obj-firefox/dist/fennec* > obj-firefox/dist/seamonkey* obj-firefox/dist/install/sea/*.exe | xargs rm > -rf" Yes, but all things being equal, I would prefer to *not* need special-case logic on Windows for deleting files. We do file removals from within the build system as well, so it would be ideal if the new tool is really a drop-in replacement for rm.
Hrmm, hitting the same problems as with the stock msys rm tool on w7, which is our worst offending platform based on the volume of reports in bug 692715. 'rm' '-rf' 'tools' in dir c:\talos-slave\test\. (timeout 1200 secs) "tools\.hg\store\data\compare-locales" - The directory is not empty. "tools\.hg\store\data" - The directory is not empty. "tools\.hg\store" - The directory is not empty. "tools\.hg" - The directory is not empty. "tools" - The directory is not empty. There's nothing *in* the compare-locales subdir, so maybe a lightweight retry would save us here? Bug 712205 is the real solution for the tools dir, but doesn't help us when cleaning up other things.
I'm going to switch focus here and pursue the theory in https://bugzilla.mozilla.org/show_bug.cgi?id=692715#c2 namely that some service on the Win7 boxes is holding those files open at the wrong time. It's intermittent, but I seem to be able to trigger it in one out of every three runs via buildbot. I should be able to script something up to narrow it down.
Blocks: 784317
Blocks: 692715
Status: ASSIGNED → NEW
Priority: P2 → P3
Given the problems we've had with this, I think the more workable solution is to try to use the existing Windows native tools (i.e. rmdir) in the automation.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WONTFIX
FYI, I've created a new Windows API-based directory removal routine in http://hg.mozilla.org/build/tools/file/92c703e1c26d/clobberer/clobberer.py#l42 which may be of interest here.
I just fixed some bugs in jhford's original winrm impl, and added support for NTFS symlinks and junctions. Code is at https://github.com/vvuk/winrm I'm currently using it on my dev machines, and am not having any problems with it so far. It rm -rf's an obj dir much faster (about 7s) than msys (~30s) or the windows explorer (~18s). Note that the code in clobberer.py doesn't handle windows symlinks or junctions; they're both a little esoteric, but I'm hoping to be able to start using symlinks where we can. Junction points would also be good.
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #29) > I just fixed some bugs in jhford's original winrm impl, and added support > for NTFS symlinks and junctions. Code is at https://github.com/vvuk/winrm > > I'm currently using it on my dev machines, and am not having any problems > with it so far. It rm -rf's an obj dir much faster (about 7s) than msys > (~30s) or the windows explorer (~18s). > > Note that the code in clobberer.py doesn't handle windows symlinks or > junctions; they're both a little esoteric, but I'm hoping to be able to > start using symlinks where we can. Junction points would also be good. Hey Vlad, Thanks for the unexpected help! I can get this into staging for testing this week. Should I be able to replace the default msys rm command with this, or will I need to change the underlying calls in the automation? FWIW, I'm not too worried about clobberer support here. The vast majority of the cases I've seen of bug 692715 have been from trying to remove the tools repo at the start of a build. Can we switch clobberer to use this code too once it's proven elsewhere?
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Product: mozilla.org → Release Engineering
Yep, should just be a drop-in replacement. I copied the original to rm-orig.exe and put this one in. The only issue I've seen so far is that long command lines cause msys to say "rm.exe: Bad file number", which doesn't happen with the original msys rm.exe. I'm not sure why that is; there isn't any explicit limit, other than Windows' regular command line limit. But maybe msys has some workarounds for long command lines? (We're talking very long here, like > 32k chars.)
(In reply to comment #31) > Yep, should just be a drop-in replacement. I copied the original to > rm-orig.exe and put this one in. > > The only issue I've seen so far is that long command lines cause msys to say > "rm.exe: Bad file number", which doesn't happen with the original msys rm.exe. > I'm not sure why that is; there isn't any explicit limit, other than Windows' > regular command line limit. But maybe msys has some workarounds for long > command lines? (We're talking very long here, like > 32k chars.) Do you get to main() when that happens?
I tried this out in staging this week on w64-ix-slave03, but we failed in |make package-tests|: make.py[0]: Entering directory 'e:\builds\moz2_slave\m-aurora-w32-00000000000000000\build\obj-firefox' e:\builds\moz2_slave\m-aurora-w32-00000000000000000\build\testing\testsuite-targets.mk:437:0$ rm -rf ./dist/test-package-stage e:\builds\moz2_slave\m-aurora-w32-00000000000000000\build\testing\testsuite-targets.mk:437:0: command 'rm -rf ./dist/test-package-stage' failed, return code 1 error getting file attributes for "./dist/test-package-stage\mochitest\content\browser\browser\devtools\styleinspector\test\browser_styleinspector_bug_677930_urls_clickable\browser_styleinspector_bug_677930_urls_clickable.css": The system cannot find the path specified. "./dist/test-package-stage\mochitest\content\browser\browser\devtools\styleinspector\test\browser_styleinspector_bug_677930_urls_clickable": The directory is not empty. "./dist/test-package-stage\mochitest\content\browser\browser\devtools\styleinspector\test": The directory is not empty. "./dist/test-package-stage\mochitest\content\browser\browser\devtools\styleinspector": The directory is not empty. "./dist/test-package-stage\mochitest\content\browser\browser\devtools": The directory is not empty. "./dist/test-package-stage\mochitest\content\browser\browser": The directory is not empty. "./dist/test-package-stage\mochitest\content\browser": The directory is not empty. "./dist/test-package-stage\mochitest\content": The directory is not empty. "./dist/test-package-stage\mochitest": The directory is not empty. "./dist/test-package-stage": The directory is not empty. The directory is still there and the slave is currently disabled if people want to poke at it. Just let me know.
I definitely do -- can I get access?
Depends on: 906245
So from looking on the machine, there is no "./dist/test-package-stage" in e:\builds\moz2_slave\m-aurora-w32-00000000000000000 I'm guessing what's happening here is that there are multiple processes that are trying to remove ./dist/test-package-stage in parallel. That would be consistent with the error coming up -- one process managed to delete the file just before another did, and maybe did it in a different order or slightly different speeds, and then tried to delete all the directories. Is there a full log available of the run? The multiple rm commands might be visible to check this.
Ah! This is because the directory exceeds PATH_MAX, and we need to use \\?\ paths to be able to have 32k long paths. I'll fix this, but it might take me a few days due to some PTO and other priorities.
Vlad: ping - don't suppose you've had a chance to poke at this in the intervening months?
If necessary and useful, we can probably hack quickly a JS implementation based on OS.File. I realize that this might be overcomplicated for the task at hand, though.
(In reply to comment #38) > If necessary and useful, we can probably hack quickly a JS implementation based > on OS.File. I realize that this might be overcomplicated for the task at hand, > though. I don't think a js version will fix the performance problem!
It is my understanding that the performance problem is due to the incorrect use of disk I/O rather than algorithms. However, booting the JS VM might kill any/some benefit.
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #39) > (In reply to comment #38) > > If necessary and useful, we can probably hack quickly a JS implementation based > > on OS.File. I realize that this might be overcomplicated for the task at hand, > > though. > > I don't think a js version will fix the performance problem! Especially when it also needs to initialize gecko.
Oh right, I knew there was something else to do on wirm. Sorry, will look at it in the next few days (I'm working with glandium to enable symlinks in our build too). The issue with \\?\ paths is that they can't be relative, so this just needs to turn all the arguments into absolute paths first. Not rocket science, thankfully.
(In reply to comment #41) > (In reply to :Ehsan Akhgari (needinfo? me!) from comment #39) > > (In reply to comment #38) > > > If necessary and useful, we can probably hack quickly a JS implementation based > > > on OS.File. I realize that this might be overcomplicated for the task at hand, > > > though. > > > > I don't think a js version will fix the performance problem! > > Especially when it also needs to initialize gecko. And especially because it won't use the correct OS APIs.
So here's the interesting thing. MSYS doesn't handle long path names at all: vladimir@wrex[2009]$ pwd /c/proj/tmp/long/this-is-a-long-path-component-0/this-is-a-long-path-component-1/this-is-a-long-path-component-2/this-is-a-long-path-component-3/this-is-a-long-path-component-4/this-is-a-long-path-component-5/this-is-a-long-path-component-6 vladimir@wrex[2010]$ pwd | wc -c 241 vladimir@wrex[2011]$ touch this-is-a-long-filename.txt touch: cannot touch `this-is-a-long-filename.txt': File or path name too long Python doesn't like this either: >>> import os >>> os.mkdir("this-is-a-long-path-name-here") Traceback (most recent call last): File "<stdin>", line 1, in <module> WindowsError: [Error 3] The system cannot find the path specified: 'this-is-a-long-path-name-here' >>> os.mkdir("foo") >>> quit() ("foo" works since it's under the 260 char limit) *Explorer* doesn't like this -- trying to create a new folder in explorer gives me a dialog that says "The file name(s) would be too long for the destination folder. You can shorten the file name and try again, or try a location that has a shorter path. [OK]"... and this is just doing "New Folder", so the default new folder name is too long. So how are we getting away with creating these paths in our build in the first place? However, for this rm implementation, it turns out that the Windows GetFullPathName function is "dumb", for better or worse. It will happily construct a path longer than MAX_PATH. This means that I can use it to make pathnames absolute and \\?\ before feeding in to further things.
Okay, I implemented this in https://github.com/vvuk/winrm Binary is at http://people.mozilla.com/~vladimir/misc/winrm.exe This really could use a test suite :p Also I haven't been able to directly test the long path name removal since I can't actually successfully create such a path, not easily anyway. Note that I ran into problems with mozmake.exe with long path names in our testing dirs too, so we may end up in a spot where we just need to avoid excessively long path names. Windows gives us 260 chars to work with; if we can keep our tree-relative paths "mozilla-central/..." to < 200, that should give us plenty of chars for people's root dirs. But that's a bit longer work. We haven't exactly tried to minimize path usage, though it's our tests that are the biggest (only?) culprit.
You should be able to call CreateDirectoryW to get such long path names, right?
Blocks: 950850
Attached file create a very long path in Windows (obsolete) —
Vladimir, here is the script I used for manual testing of my Python-based long path removal script (bug 860426).
Attached file create a very long path in Windows (obsolete) —
fixed copy/paste error in original patch
Attachment #8349526 - Attachment is obsolete: true
attach file directly
Attachment #8349780 - Attachment is obsolete: true
I created a C++ testcase (that script didn't actually create anything, and I didn't have win32file ;)... seemed to work fine with very long paths (500+ chars).
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #45) > Binary is at http://people.mozilla.com/~vladimir/misc/winrm.exe I'm testing this version in staging right now.
No issues in staging over the holidays. We ran a range of nightlies, dep, PGO, and repack builds. I'm going to manually deploy this (reversibly) to a handful of production slaves, say 10, and monitor them for a day under normal load. If we don't see any deletion issues, I can deploy to the rest of the pool and file a relops bug to get this added to GPO.
I've installed the new rm.exe on w64-ix-slave[40-49]. I'll keep an eye on them until tomorrow and then assess for wider deployment. Also letting armenzg know as buildduty.
(In reply to Chris Cooper [:coop] from comment #53) > I've installed the new rm.exe on w64-ix-slave[40-49]. I'll keep an eye on > them until tomorrow and then assess for wider deployment. > > Also letting armenzg know as buildduty. No issues observed in build or try. Starting deployment to all Windows build/try slaves.
Great -- has there been any observed performance win, out of curiosity?
(In reply to Chris Cooper [:coop] from comment #54) > No issues observed in build or try. Starting deployment to all Windows > build/try slaves. winrm has been deployed on all the w64-ix slaves. Filed bug 958229 for the small VLAN inconsistency I found while doing so. Still need to file the follow-up to get this change into GPO. Also need to test winrm on the test slaves, but the lack of clobber requests and associated source trees on those machines makes that less pressing.
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #55) > Great -- has there been any observed performance win, out of curiosity? Yes, there is a performance win. Sorry, it would have been good to report on that. For simpler tasks like removing the tools dir, I see about a 50% speedup (10s instead of 20s). For bigger tasks like removing the build dir at the end of the nightly, we save about 2 minutes, or a 25% speedup (360s instead of 480s).
Running into some problems with |make package-tests|, but only on mozilla-beta. Wondering whether there is a build system issue here. https://tbpl.mozilla.org/?tree=Mozilla-Beta&rev=46cd787239b7 If we need to back this change out, we can log into each slave and execute the following: cd /bin cp rm-orig.exe rm.exe If a straight copy doesn't work, you can use runas: cd /bin runas /User:Administrator "cp /bin/rm-orig.exe /bin/rm.exe" If that doesn't work, VNC into the slave, right-click the command prompt and run as administrator, then run: c:\mozilla-build\msys\bin\bash.exe --login -i cd /bin cp rm-orig.exe rm.exe
No surprise, my retrigger on esr24 failed the same way, so if we decide to wait it out we'll be waiting until esr24 dies.
Depends on: 939719
This has been reverted due to the failures on beta, release, and esr. :( I'll start running against beta in staging to diagnose.
I've applied the patch from bug 939719 against a copy of mozilla-beta and am testing a build now. The patch applied cleanly with minimal fuzz, so I'm optimistic we can get this uplifted if it works.
(In reply to Chris Cooper [:coop] from comment #61) > I've applied the patch from bug 939719 against a copy of mozilla-beta and am > testing a build now. The patch applied cleanly with minimal fuzz, so I'm > optimistic we can get this uplifted if it works. Patch applies cleanly with fuzz on mozilla-release. There are rejects on mozilla-esr24, but it looks like the code has just been reorganized. I'm going to test some basic builds on mozilla-esr24 with the patch and then request uplift everywhere.
After I started using the correct rev1 w64 slave for the esr24 jobs, things went a lot better. I'll start making uplift requests for bug 939719 to beta, release, and esr24 today.
After some hiccups on esr24, uplift is complete. I'm planning to re-deploy winrm tomorrow AM EST.
Note one bug I just found -- removing a dir via "rm -rf foo/" (note the trailing slash) errors out. I'll fix this shortly. Wonder if that was maybe contributing to the earlier problem?
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #65) > Note one bug I just found -- removing a dir via "rm -rf foo/" (note the > trailing slash) errors out. I'll fix this shortly. Wonder if that was > maybe contributing to the earlier problem? Quite possibly. I'm still working through some Thunderbird builds prior to deploying, despite what I said in comment #64.
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #65) > Note one bug I just found -- removing a dir via "rm -rf foo/" (note the > trailing slash) errors out. I'll fix this shortly. OK, holding off on any deployment plans until the new version is ready. This seems like a case we could trip over quite easily.
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #65) > Note one bug I just found -- removing a dir via "rm -rf foo/" (note the > trailing slash) errors out. I'll fix this shortly. Vlad: any idea when you'll have a chance to revisit winrm?
Flags: needinfo?(vladimir)
Ideally in the next few days -- MWC/1.3/etc. stuff took over unfortunately. :/
Flags: needinfo?(vladimir)
I would also love to create a test suite, but do you have any idea how annoying it is to think about a test suite for "rm"?! :)
No longer blocks: 977228
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #69) > Ideally in the next few days -- MWC/1.3/etc. stuff took over unfortunately. > :/ Re-pinging after MWC.
(In reply to Chris Cooper [:coop] [away until April 3] from comment #71) > (In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #69) > > Ideally in the next few days -- MWC/1.3/etc. stuff took over unfortunately. > > :/ > > Re-pinging after MWC. Re-assigning and re-pinging. Vlad: can you suggest someone else to follow-up with here if you can't get to this?
Assignee: coop → vladimir
I took a stab at a patch. There's a pending pull request: https://github.com/vvuk/winrm/pull/1
Sorry, I actually fixed this a few weeks ago and forgot to push/post :/ The github repo is updated, as well as http://people.mozilla.com/~vladimir/misc/winrm.exe .
I'm not sure this will fix bug 981298 fwiw -- every once in a while windows does get in a weird NTFS state where it thinks something has a lock/is using a directory, when nothing is. I never figured out what was actually causing it and how to fix it, short of a reboot.
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #76) > I'm not sure this will fix bug 981298 fwiw -- every once in a while windows > does get in a weird NTFS state where it thinks something has a lock/is using > a directory, when nothing is. I never figured out what was actually causing > it and how to fix it, short of a reboot. I'll still take the 70% speedup this gives us on file deletions in the pathological cases (3 levels of nested m-c clones), based on my testing today. I'll start deployment this week and make sure this gets into GPO.
Status: REOPENED → RESOLVED
Closed: 12 years ago11 years ago
Resolution: --- → FIXED
Awesome! For bug 981298, that's not caused by this; no idea how things ever get in that state. Should be independent of rm implementations.
Sigh. We hit bug 1013705 in the release process, and we're currently working around it with https://bugzilla.mozilla.org/show_bug.cgi?id=1013700#c2. I've managed to reproduce it in staging too. command: START command: rm -rf /c/builds/moz2_slave/rel-m-beta-w32_sa_rpk-00000000/mozilla-beta/obj-l10n/dist/current command: cwd: c:\builds\moz2_slave\rel-m-beta-w32_sa_rpk-00000000 command: output: command: END (0.00s elapsed) command: START command: mkdir /c/builds/moz2_slave/rel-m-beta-w32_sa_rpk-00000000/mozilla-beta/obj-l10n/dist/current command: cwd: c:\builds\moz2_slave\rel-m-beta-w32_sa_rpk-00000000 command: output: command: ERROR Traceback (most recent call last): File "c:\builds\moz2_slave\rel-m-beta-w32_sa_rpk-00000000\scripts\lib\python\util\commands.py", line 47, in run_cmd return subprocess.check_call(cmd, **kwargs) File "c:\mozilla-build\python27\lib\subprocess.py", line 511, in check_call raise CalledProcessError(retcode, cmd) CalledProcessError: Command '['mkdir', '/c/builds/moz2_slave/rel-m-beta-w32_sa_rpk-00000000/mozilla-beta/obj-l10n/dist/current']' returned non-zero exit status 1 command: END (0.01s elapsed) The following tracebacks were detected during repacks: fr: Traceback (most recent call last): File "c:/builds/moz2_slave/rel-m-beta-w32_sa_rpk-00000000/scripts/scripts/l10n/create-release-repacks.py", line 124, in createRepacks buildNumber=buildNumber, stageServer=stageServer) File "c:\builds\moz2_slave\rel-m-beta-w32_sa_rpk-00000000\scripts\lib\python\build\l10n.py", line 183, in repackLocale run_cmd(['mkdir', current]) File "c:\builds\moz2_slave\rel-m-beta-w32_sa_rpk-00000000\scripts\lib\python\util\commands.py", line 47, in run_cmd return subprocess.check_call(cmd, **kwargs) File "c:\mozilla-build\python27\lib\subprocess.py", line 511, in check_call raise CalledProcessError(retcode, cmd) CalledProcessError: Command '['mkdir', '/c/builds/moz2_slave/rel-m-beta-w32_sa_rpk-00000000/mozilla-beta/obj-l10n/dist/current']' returned non-zero exit status 1 Full log is here for those with access: http://dev-master1.srv.releng.scl3.mozilla.com:8044/builders/release-mozilla-beta-win32_standalone_repack/builds/6/steps/run_script/logs/stdio Logging into the slave, the "current" dir still seems to have all/most of its contents. Not sure why it's returning so quickly here without actually completing the action.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I've disabled the slave in question if anyone wants to take a look. Running the same rm command from the login shell (ssh) works as expected, of course.
Interesting. winrm doesn't even seem to be finding the directory. The corresponding mkdir call is expanding out the path as follows: Expanded out '/c/builds/moz2_slave/rel-m-beta-w32_sa_rpk-00000000/mozilla-beta/obj-l10n/dist/current' => '\\?\c:\c\builds\moz2_slave\rel-m-beta-w32_sa_rpk-00000000\mozilla-beta\obj-l10n\dist\current' mkdir: cannot create directory `/c/builds/moz2_slave/rel-m-beta-w32_sa_rpk-00000000/mozilla-beta/obj-l10n/dist/current': File exists So it seems that winrm is unable to munge the path in the same way in this context.
(In reply to Chris Cooper [:coop] from comment #81) > Expanded out > '/c/builds/moz2_slave/rel-m-beta-w32_sa_rpk-00000000/mozilla-beta/obj-l10n/ > dist/current' => > '\\?\c:\c\builds\moz2_slave\rel-m-beta-w32_sa_rpk-00000000\mozilla-beta\obj- > l10n\dist\current' > mkdir: cannot create directory > `/c/builds/moz2_slave/rel-m-beta-w32_sa_rpk-00000000/mozilla-beta/obj-l10n/ > dist/current': File exists Nevermind. That's actually poorly buffered output from running winrm with the verbose option. So it looks like winrm is doing path translation here when it doesn't need to.
It's transforming /c/foo in \\?\c:\c\foo (notice the two "c")
How is winrm built? Is it built with msys gcc? That would explain it: msys doesn't translate msys paths to windows paths when executing msys applications.
It's built with msvc; it doesn't touch or link to msys at all. I noticed the c:\c\ -- that's the problem. Which make/shell is running this? Whatever is executing winrm needs to translate paths to normal windows-style paths; winrm doesn't understand msys/cygwin paths. I mean, I can *make* it do so, with a command line option maybe.. but our build system is a quagmire of different types of paths, it would be better if we could just understand what's causing wrong paths to be generated.
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #85) > I mean, I can *make* it do so, with a command line option maybe.. but our > build system is a quagmire of different types of paths, it would be better > if we could just understand what's causing wrong paths to be generated. The short answer is "l10n," but I'm happy to fix that script over in bug 1013705 instead of this tool.
Status: REOPENED → RESOLVED
Closed: 11 years ago10 years ago
Resolution: --- → FIXED
Blocks: 981298
(In reply to Armen Zambrano [:armenzg] (EDT/UTC-4) from comment #8) > Perhaps we want to file a bug for MozillaBuild to take this change too? I filed bug 1025099.
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: