Closed
Bug 727551
Opened 13 years ago
Closed 10 years ago
Create a Windows-native alternate to msys rm.exe to avoid common problems deleting files
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task, P3)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jhford, Assigned: vlad)
References
Details
Attachments
(1 file, 2 obsolete files)
347 bytes,
text/x-python-script
|
Details |
I wrote a tool to delete files on windows using native win32 api calls. Lets deploy this to our windows build and test slaves.
Reporter | ||
Updated•13 years ago
|
Whiteboard: [opsi]
Reporter | ||
Comment 1•13 years ago
|
||
Comment 2•13 years ago
|
||
jhford, any chance we can get this code public in Open Source so that its not a black box if something is wrong with it?
Reporter | ||
Comment 3•13 years ago
|
||
(In reply to Justin Wood (:Callek) from comment #2)
> jhford, any chance we can get this code public in Open Source so that its
> not a black box if something is wrong with it?
https://bugzilla.mozilla.org/show_bug.cgi?id=583129#c8, http://blog.johnford.org/writting-a-native-rm-program-for-windows/
Comment 4•13 years ago
|
||
Updated•13 years ago
|
Component: Release Engineering → Release Engineering: Platform Support
OS: Mac OS X → Windows Server 2003
Priority: -- → P3
QA Contact: release → coop
Comment 6•13 years ago
|
||
Unlike the Win32 slaves, which hit rm problems maybe once a week with dozens of slaves doing jobs all day and night, the Win64 test slaves (which are only visible on https://tbpl.mozilla.org/?tree=Jetpack, but are also on https://tbpl.mozilla.org/?noignore=1) hit rm problems more like once every push, so if we're going to actually get serious about Win64 this blocks it.
Updated•13 years ago
|
Assignee: nobody → coop
Comment 7•12 years ago
|
||
I think the easiest way to do this is to drop winrm.exe into the existing path renamed as rm.exe such that the new rm.exe is used instead of the stock msys one.
Casual inspection indicates that dropping winrm.exe into the mozilla-build/ root should allow this approach to work on build machines. On test machines, it might be safer to rename the existing msys rm.exe and replace it with this new one. Maybe we do this everywhere just to be consistent.
Replacing the existing rm.exe would allow us to avoid having to do platform detection in the buildbot code everywhere where we want to remove files, i.e. any existing |rm -rf| should just work, with the added benefit of working for any file removals happening as part of the build process itself.
Is this a workable deployment strategy or am I missing something here?
Status: NEW → ASSIGNED
Priority: P3 → P2
Comment 8•12 years ago
|
||
Sounds workable.
Perhaps we want to file a bug for MozillaBuild to take this change too?
BTW did we completely discarded just using rmdir? or this is in general so whenever rm is called *anywhere* we use jhford's version?
Comment 9•12 years ago
|
||
(In reply to Armen Zambrano G. [:armenzg] - Release Engineer from comment #8)
> BTW did we completely discarded just using rmdir? or this is in general so
> whenever rm is called *anywhere* we use jhford's version?
I think we did discount the idea of using rmdir. It would add (more) special case code for Windows, whereas jhford's winrm is a drop-in replacement for rm.
Comment 10•12 years ago
|
||
I've started deployment this morning. So far I've done the t-r3-w764 slaves, and will update my progress as I go.
Summary: deploy winrm.exe to windows build slaves → deploy winrm.exe to windows slaves
Comment 11•12 years ago
|
||
Decided explicitly *not* to deploy this to OPSI, but will make sure that it makes it into the old ref images (just in case) and makes it into GPO and the current pre-GPO win64 imaging process.
Whiteboard: [opsi]
Comment 12•12 years ago
|
||
I've reverted back to the msys version of rm.exe everywhere. I took "Let's deploy this" in comment #0 and comment #1 to mean we were ready to deploy this. Guess not. :/
jhford: Given https://bugzilla.mozilla.org/show_bug.cgi?id=765368#c3 , what systems was this actually tested on?
Comment 13•12 years ago
|
||
Coop asked me to comment about the scope of this work, as someone who knows C++
I suspect 1-3 hours, give or take if I was to do it, since I know C++ on-windows well, but not these File Manip APIs, so would spend the majority of my time learning how this code is written and MSDN pages about these APIs.
If John/someone-else-who-knows-these-apis-well, went to try this, could probably cut that time in half or more.
If we wanted to just *blindly* make the failure non-fatal by removing return FALSE; here, we could probably turnaround on a binary in 30 minutes or less for testing.
*whatever* choice we do, we need to stage this (imo on all win OS types we use, including 32-bit w2k3 builders)
Reporter | ||
Comment 14•12 years ago
|
||
(In reply to Chris Cooper [:coop] from comment #12)
> I've reverted back to the msys version of rm.exe everywhere. I took "Let's
> deploy this" in comment #0 and comment #1 to mean we were ready to deploy
> this. Guess not. :/
>
> jhford: Given https://bugzilla.mozilla.org/show_bug.cgi?id=765368#c3 , what
> systems was this actually tested on?
I tested it as a standalone application a few different systems. I never expected that it'd overwrite the msys rm program. Was this seen while it ran on staging?
Justin, I don't have access to a windows build environment right now, but here is a patch that should remove the Wow64 stuff completely. The potential fallout is that if the program runs as a 32bit binary on a 64bit system, it will be unable to delete files that Wow64 provides (should be limited to system32, and the likes). I don't have an exhaustive list of files that are wrapped by wow64, but my guess is that they'd all be system libraries which i assume shouldn't be deleted in CI processes!
http://hg.mozilla.org/users/jford_mozilla.com/jhford-native-rm/rev/84a240176b93
This should be a matter of cloning on a win2k3 slave and running cl rm.cpp
Reporter | ||
Comment 15•12 years ago
|
||
(In reply to John Ford [:jhford] from comment #14)
> I don't have access to a windows build environment right now,
To be clear I haven't compiled this to see if it's valid.
Comment 16•12 years ago
|
||
(In reply to John Ford [:jhford] from comment #14)
> (In reply to Chris Cooper [:coop] from comment #12)
> I tested it as a standalone application a few different systems. I never
> expected that it'd overwrite the msys rm program. Was this seen while it
> ran on staging?
"Deploy" means something different to the two of us. Changing the bug summary to avoid further confusion.
For those following along at home, Callek is going to take a stab at updating the source to avoid/ignore the Wow64 problems. He'll give me something I can test in staging.
Summary: deploy winrm.exe to windows slaves → Create a Windows-native alternate to msys rm.exe to avoid common problems deleting files
Comment 18•12 years ago
|
||
To start with, Creating the .exe is easy
I created this by:
* Login to a preprod w2k8 builder as cltbld
* Launch start-msvc10
* hg clone https://hg.mozilla.org/build/tools (where this now lives)
* pushd tools/buildfarm/utils/winrm
* make
* <copy out winrm.exe to somewhere>
* popd
* rm -rf tools
---
I tossed winrm on the buildvpn in my user directory, since I've yet to get myself a people account [I might as well soon] and I think I applied the proper chmods to let others get it.
vpn1.dmz.releng.scl3.mozilla.com:/home/Callek/winrm.exe
Any problems grabbing this; just ping me.
Comment 19•12 years ago
|
||
I've grabbed the updated file and am testing it on a win64 builder in staging right now.
Comment 20•12 years ago
|
||
Hrmmm, this version doesn't handle wildcards. Deleting individual files or directories when specified precisely works fine, but the following fails, e.g.:
rm -rf obj-firefox/dist/firefox-* obj-firefox/dist/fennec* obj-firefox/dist/seamonkey* obj-firefox/dist/install/sea/*.exe
Invalid file attributes for "obj-firefox/dist/install/sea/*.exe"
Invalid file attributes for "obj-firefox/dist/seamonkey*"
Invalid file attributes for "obj-firefox/dist/fennec*"
Invalid file attributes for "obj-firefox/dist/firefox-*"
Reporter | ||
Comment 21•12 years ago
|
||
Nope, wildcards weren't specifically implemented but my understanding of the API that I used is that it supported "*" and "?" in path names [1] and would expand the wildcards as expected. I am not sure why it isn't expanding the wildcards.
[1] http://msdn.microsoft.com/en-us/library/windows/desktop/aa364418%28v=vs.85%29.aspx
Comment 22•12 years ago
|
||
I'm trying a new version of the tool in staging that just ignores files with bad attributes, which I'm assuming to be non-existent files or directory hierarchies. I don't know if that's a valid assumption. Please speak up if you think it's not.
In limited testing so far, the new version is successfully removing files and ignoring missing trees.
Reporter | ||
Comment 23•12 years ago
|
||
I think the problem is that del() needs to use the FindFirstFileW api on each of the paths passed in for wildcard expansion. I expected that the wildcards would have been expanded by the shell, but it might be that because it's native win32, msys doesn't munge it's argv.
Ignoring the invalid attribute results in not expanding the wildcard which in turn means that nothing is deleted.
A workaround would be to change all the rm commands where winrm would also be used to
"ls obj-firefox/dist/firefox-* obj-firefox/dist/fennec* obj-firefox/dist/seamonkey* obj-firefox/dist/install/sea/*.exe | xargs rm -rf"
that would be a no-op on linux and mac platforms. Provided that xargs and winrm play nicely, it would let msys do wildcard expansion and unix->win32 path translations.
Comment 24•12 years ago
|
||
(In reply to John Ford [:jhford] from comment #23)
> Ignoring the invalid attribute results in not expanding the wildcard which
> in turn means that nothing is deleted.
I don't think this is the case. It's not that the tool isn't expanding the wildcard, it's that one of the subdirs in the path doesn't exist, which triggers the bad attributes error. Wildcard or no, if the path doesn't exist, there was nothing to delete in the first place.
I can test this pretty trivially by hand to verify things are being deleted recursively when they should be.
> A workaround would be to change all the rm commands where winrm would also
> be used to
>
> "ls obj-firefox/dist/firefox-* obj-firefox/dist/fennec*
> obj-firefox/dist/seamonkey* obj-firefox/dist/install/sea/*.exe | xargs rm
> -rf"
Yes, but all things being equal, I would prefer to *not* need special-case logic on Windows for deleting files. We do file removals from within the build system as well, so it would be ideal if the new tool is really a drop-in replacement for rm.
Comment 25•12 years ago
|
||
Hrmm, hitting the same problems as with the stock msys rm tool on w7, which is our worst offending platform based on the volume of reports in bug 692715.
'rm' '-rf' 'tools'
in dir c:\talos-slave\test\. (timeout 1200 secs)
"tools\.hg\store\data\compare-locales" - The directory is not empty.
"tools\.hg\store\data" - The directory is not empty.
"tools\.hg\store" - The directory is not empty.
"tools\.hg" - The directory is not empty.
"tools" - The directory is not empty.
There's nothing *in* the compare-locales subdir, so maybe a lightweight retry would save us here?
Bug 712205 is the real solution for the tools dir, but doesn't help us when cleaning up other things.
Comment 26•12 years ago
|
||
I'm going to switch focus here and pursue the theory in https://bugzilla.mozilla.org/show_bug.cgi?id=692715#c2 namely that some service on the Win7 boxes is holding those files open at the wrong time. It's intermittent, but I seem to be able to trigger it in one out of every three runs via buildbot. I should be able to script something up to narrow it down.
Updated•12 years ago
|
Status: ASSIGNED → NEW
Priority: P2 → P3
Comment 27•12 years ago
|
||
Given the problems we've had with this, I think the more workable solution is to try to use the existing Windows native tools (i.e. rmdir) in the automation.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WONTFIX
Comment 28•12 years ago
|
||
FYI, I've created a new Windows API-based directory removal routine in
http://hg.mozilla.org/build/tools/file/92c703e1c26d/clobberer/clobberer.py#l42
which may be of interest here.
Assignee | ||
Comment 29•11 years ago
|
||
I just fixed some bugs in jhford's original winrm impl, and added support for NTFS symlinks and junctions. Code is at https://github.com/vvuk/winrm
I'm currently using it on my dev machines, and am not having any problems with it so far. It rm -rf's an obj dir much faster (about 7s) than msys (~30s) or the windows explorer (~18s).
Note that the code in clobberer.py doesn't handle windows symlinks or junctions; they're both a little esoteric, but I'm hoping to be able to start using symlinks where we can. Junction points would also be good.
Comment 30•11 years ago
|
||
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #29)
> I just fixed some bugs in jhford's original winrm impl, and added support
> for NTFS symlinks and junctions. Code is at https://github.com/vvuk/winrm
>
> I'm currently using it on my dev machines, and am not having any problems
> with it so far. It rm -rf's an obj dir much faster (about 7s) than msys
> (~30s) or the windows explorer (~18s).
>
> Note that the code in clobberer.py doesn't handle windows symlinks or
> junctions; they're both a little esoteric, but I'm hoping to be able to
> start using symlinks where we can. Junction points would also be good.
Hey Vlad,
Thanks for the unexpected help! I can get this into staging for testing this week. Should I be able to replace the default msys rm command with this, or will I need to change the underlying calls in the automation?
FWIW, I'm not too worried about clobberer support here. The vast majority of the cases I've seen of bug 692715 have been from trying to remove the tools repo at the start of a build. Can we switch clobberer to use this code too once it's proven elsewhere?
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
Assignee | ||
Comment 31•11 years ago
|
||
Yep, should just be a drop-in replacement. I copied the original to rm-orig.exe and put this one in.
The only issue I've seen so far is that long command lines cause msys to say "rm.exe: Bad file number", which doesn't happen with the original msys rm.exe. I'm not sure why that is; there isn't any explicit limit, other than Windows' regular command line limit. But maybe msys has some workarounds for long command lines? (We're talking very long here, like > 32k chars.)
Comment 32•11 years ago
|
||
(In reply to comment #31)
> Yep, should just be a drop-in replacement. I copied the original to
> rm-orig.exe and put this one in.
>
> The only issue I've seen so far is that long command lines cause msys to say
> "rm.exe: Bad file number", which doesn't happen with the original msys rm.exe.
> I'm not sure why that is; there isn't any explicit limit, other than Windows'
> regular command line limit. But maybe msys has some workarounds for long
> command lines? (We're talking very long here, like > 32k chars.)
Do you get to main() when that happens?
Comment 33•11 years ago
|
||
I tried this out in staging this week on w64-ix-slave03, but we failed in |make package-tests|:
make.py[0]: Entering directory 'e:\builds\moz2_slave\m-aurora-w32-00000000000000000\build\obj-firefox'
e:\builds\moz2_slave\m-aurora-w32-00000000000000000\build\testing\testsuite-targets.mk:437:0$ rm -rf ./dist/test-package-stage
e:\builds\moz2_slave\m-aurora-w32-00000000000000000\build\testing\testsuite-targets.mk:437:0: command 'rm -rf ./dist/test-package-stage' failed, return code 1
error getting file attributes for "./dist/test-package-stage\mochitest\content\browser\browser\devtools\styleinspector\test\browser_styleinspector_bug_677930_urls_clickable\browser_styleinspector_bug_677930_urls_clickable.css": The system cannot find the path specified.
"./dist/test-package-stage\mochitest\content\browser\browser\devtools\styleinspector\test\browser_styleinspector_bug_677930_urls_clickable": The directory is not empty.
"./dist/test-package-stage\mochitest\content\browser\browser\devtools\styleinspector\test": The directory is not empty.
"./dist/test-package-stage\mochitest\content\browser\browser\devtools\styleinspector": The directory is not empty.
"./dist/test-package-stage\mochitest\content\browser\browser\devtools": The directory is not empty.
"./dist/test-package-stage\mochitest\content\browser\browser": The directory is not empty.
"./dist/test-package-stage\mochitest\content\browser": The directory is not empty.
"./dist/test-package-stage\mochitest\content": The directory is not empty.
"./dist/test-package-stage\mochitest": The directory is not empty.
"./dist/test-package-stage": The directory is not empty.
The directory is still there and the slave is currently disabled if people want to poke at it. Just let me know.
Assignee | ||
Comment 34•11 years ago
|
||
I definitely do -- can I get access?
Assignee | ||
Comment 35•11 years ago
|
||
So from looking on the machine, there is no "./dist/test-package-stage" in e:\builds\moz2_slave\m-aurora-w32-00000000000000000
I'm guessing what's happening here is that there are multiple processes that are trying to remove ./dist/test-package-stage in parallel. That would be consistent with the error coming up -- one process managed to delete the file just before another did, and maybe did it in a different order or slightly different speeds, and then tried to delete all the directories.
Is there a full log available of the run? The multiple rm commands might be visible to check this.
Assignee | ||
Comment 36•11 years ago
|
||
Ah! This is because the directory exceeds PATH_MAX, and we need to use \\?\ paths to be able to have 32k long paths. I'll fix this, but it might take me a few days due to some PTO and other priorities.
Comment 37•11 years ago
|
||
Vlad: ping - don't suppose you've had a chance to poke at this in the intervening months?
If necessary and useful, we can probably hack quickly a JS implementation based on OS.File. I realize that this might be overcomplicated for the task at hand, though.
Comment 39•11 years ago
|
||
(In reply to comment #38)
> If necessary and useful, we can probably hack quickly a JS implementation based
> on OS.File. I realize that this might be overcomplicated for the task at hand,
> though.
I don't think a js version will fix the performance problem!
It is my understanding that the performance problem is due to the incorrect use of disk I/O rather than algorithms. However, booting the JS VM might kill any/some benefit.
Comment 41•11 years ago
|
||
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #39)
> (In reply to comment #38)
> > If necessary and useful, we can probably hack quickly a JS implementation based
> > on OS.File. I realize that this might be overcomplicated for the task at hand,
> > though.
>
> I don't think a js version will fix the performance problem!
Especially when it also needs to initialize gecko.
Assignee | ||
Comment 42•11 years ago
|
||
Oh right, I knew there was something else to do on wirm. Sorry, will look at it in the next few days (I'm working with glandium to enable symlinks in our build too).
The issue with \\?\ paths is that they can't be relative, so this just needs to turn all the arguments into absolute paths first. Not rocket science, thankfully.
Comment 43•11 years ago
|
||
(In reply to comment #41)
> (In reply to :Ehsan Akhgari (needinfo? me!) from comment #39)
> > (In reply to comment #38)
> > > If necessary and useful, we can probably hack quickly a JS implementation based
> > > on OS.File. I realize that this might be overcomplicated for the task at hand,
> > > though.
> >
> > I don't think a js version will fix the performance problem!
>
> Especially when it also needs to initialize gecko.
And especially because it won't use the correct OS APIs.
Assignee | ||
Comment 44•11 years ago
|
||
So here's the interesting thing. MSYS doesn't handle long path names at all:
vladimir@wrex[2009]$ pwd
/c/proj/tmp/long/this-is-a-long-path-component-0/this-is-a-long-path-component-1/this-is-a-long-path-component-2/this-is-a-long-path-component-3/this-is-a-long-path-component-4/this-is-a-long-path-component-5/this-is-a-long-path-component-6
vladimir@wrex[2010]$ pwd | wc -c
241
vladimir@wrex[2011]$ touch this-is-a-long-filename.txt
touch: cannot touch `this-is-a-long-filename.txt': File or path name too long
Python doesn't like this either:
>>> import os
>>> os.mkdir("this-is-a-long-path-name-here")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
WindowsError: [Error 3] The system cannot find the path specified: 'this-is-a-long-path-name-here'
>>> os.mkdir("foo")
>>> quit()
("foo" works since it's under the 260 char limit)
*Explorer* doesn't like this -- trying to create a new folder in explorer gives me a dialog that says "The file name(s) would be too long for the destination folder. You can shorten the file name and try again, or try a location that has a shorter path. [OK]"... and this is just doing "New Folder", so the default new folder name is too long.
So how are we getting away with creating these paths in our build in the first place?
However, for this rm implementation, it turns out that the Windows GetFullPathName function is "dumb", for better or worse. It will happily construct a path longer than MAX_PATH. This means that I can use it to make pathnames absolute and \\?\ before feeding in to further things.
Assignee | ||
Comment 45•11 years ago
|
||
Okay, I implemented this in https://github.com/vvuk/winrm
Binary is at http://people.mozilla.com/~vladimir/misc/winrm.exe
This really could use a test suite :p Also I haven't been able to directly test the long path name removal since I can't actually successfully create such a path, not easily anyway. Note that I ran into problems with mozmake.exe with long path names in our testing dirs too, so we may end up in a spot where we just need to avoid excessively long path names. Windows gives us 260 chars to work with; if we can keep our tree-relative paths "mozilla-central/..." to < 200, that should give us plenty of chars for people's root dirs. But that's a bit longer work. We haven't exactly tried to minimize path usage, though it's our tests that are the biggest (only?) culprit.
Comment 46•11 years ago
|
||
You should be able to call CreateDirectoryW to get such long path names, right?
Comment 47•11 years ago
|
||
Vladimir, here is the script I used for manual testing of my Python-based long path removal script (bug 860426).
Comment 48•11 years ago
|
||
fixed copy/paste error in original patch
Attachment #8349526 -
Attachment is obsolete: true
Assignee | ||
Comment 50•11 years ago
|
||
I created a C++ testcase (that script didn't actually create anything, and I didn't have win32file ;)... seemed to work fine with very long paths (500+ chars).
Comment 51•11 years ago
|
||
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #45)
> Binary is at http://people.mozilla.com/~vladimir/misc/winrm.exe
I'm testing this version in staging right now.
Comment 52•11 years ago
|
||
No issues in staging over the holidays. We ran a range of nightlies, dep, PGO, and repack builds.
I'm going to manually deploy this (reversibly) to a handful of production slaves, say 10, and monitor them for a day under normal load. If we don't see any deletion issues, I can deploy to the rest of the pool and file a relops bug to get this added to GPO.
Comment 53•11 years ago
|
||
I've installed the new rm.exe on w64-ix-slave[40-49]. I'll keep an eye on them until tomorrow and then assess for wider deployment.
Also letting armenzg know as buildduty.
Comment 54•11 years ago
|
||
(In reply to Chris Cooper [:coop] from comment #53)
> I've installed the new rm.exe on w64-ix-slave[40-49]. I'll keep an eye on
> them until tomorrow and then assess for wider deployment.
>
> Also letting armenzg know as buildduty.
No issues observed in build or try. Starting deployment to all Windows build/try slaves.
Assignee | ||
Comment 55•11 years ago
|
||
Great -- has there been any observed performance win, out of curiosity?
Comment 56•11 years ago
|
||
(In reply to Chris Cooper [:coop] from comment #54)
> No issues observed in build or try. Starting deployment to all Windows
> build/try slaves.
winrm has been deployed on all the w64-ix slaves. Filed bug 958229 for the small VLAN inconsistency I found while doing so.
Still need to file the follow-up to get this change into GPO.
Also need to test winrm on the test slaves, but the lack of clobber requests and associated source trees on those machines makes that less pressing.
Comment 57•11 years ago
|
||
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #55)
> Great -- has there been any observed performance win, out of curiosity?
Yes, there is a performance win. Sorry, it would have been good to report on that.
For simpler tasks like removing the tools dir, I see about a 50% speedup (10s instead of 20s).
For bigger tasks like removing the build dir at the end of the nightly, we save about 2 minutes, or a 25% speedup (360s instead of 480s).
Comment 58•11 years ago
|
||
Running into some problems with |make package-tests|, but only on mozilla-beta. Wondering whether there is a build system issue here.
https://tbpl.mozilla.org/?tree=Mozilla-Beta&rev=46cd787239b7
If we need to back this change out, we can log into each slave and execute the following:
cd /bin
cp rm-orig.exe rm.exe
If a straight copy doesn't work, you can use runas:
cd /bin
runas /User:Administrator "cp /bin/rm-orig.exe /bin/rm.exe"
If that doesn't work, VNC into the slave, right-click the command prompt and run as administrator, then run:
c:\mozilla-build\msys\bin\bash.exe --login -i
cd /bin
cp rm-orig.exe rm.exe
Comment 59•11 years ago
|
||
No surprise, my retrigger on esr24 failed the same way, so if we decide to wait it out we'll be waiting until esr24 dies.
Depends on: 939719
Comment 60•11 years ago
|
||
This has been reverted due to the failures on beta, release, and esr. :(
I'll start running against beta in staging to diagnose.
Comment 61•11 years ago
|
||
I've applied the patch from bug 939719 against a copy of mozilla-beta and am testing a build now. The patch applied cleanly with minimal fuzz, so I'm optimistic we can get this uplifted if it works.
Comment 62•11 years ago
|
||
(In reply to Chris Cooper [:coop] from comment #61)
> I've applied the patch from bug 939719 against a copy of mozilla-beta and am
> testing a build now. The patch applied cleanly with minimal fuzz, so I'm
> optimistic we can get this uplifted if it works.
Patch applies cleanly with fuzz on mozilla-release. There are rejects on mozilla-esr24, but it looks like the code has just been reorganized. I'm going to test some basic builds on mozilla-esr24 with the patch and then request uplift everywhere.
Comment 63•11 years ago
|
||
After I started using the correct rev1 w64 slave for the esr24 jobs, things went a lot better.
I'll start making uplift requests for bug 939719 to beta, release, and esr24 today.
Comment 64•11 years ago
|
||
After some hiccups on esr24, uplift is complete. I'm planning to re-deploy winrm tomorrow AM EST.
Assignee | ||
Comment 65•11 years ago
|
||
Note one bug I just found -- removing a dir via "rm -rf foo/" (note the trailing slash) errors out. I'll fix this shortly. Wonder if that was maybe contributing to the earlier problem?
Comment 66•11 years ago
|
||
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #65)
> Note one bug I just found -- removing a dir via "rm -rf foo/" (note the
> trailing slash) errors out. I'll fix this shortly. Wonder if that was
> maybe contributing to the earlier problem?
Quite possibly. I'm still working through some Thunderbird builds prior to deploying, despite what I said in comment #64.
Comment 67•11 years ago
|
||
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #65)
> Note one bug I just found -- removing a dir via "rm -rf foo/" (note the
> trailing slash) errors out. I'll fix this shortly.
OK, holding off on any deployment plans until the new version is ready. This seems like a case we could trip over quite easily.
Comment 68•11 years ago
|
||
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #65)
> Note one bug I just found -- removing a dir via "rm -rf foo/" (note the
> trailing slash) errors out. I'll fix this shortly.
Vlad: any idea when you'll have a chance to revisit winrm?
Flags: needinfo?(vladimir)
Assignee | ||
Comment 69•11 years ago
|
||
Ideally in the next few days -- MWC/1.3/etc. stuff took over unfortunately. :/
Flags: needinfo?(vladimir)
Assignee | ||
Comment 70•11 years ago
|
||
I would also love to create a test suite, but do you have any idea how annoying it is to think about a test suite for "rm"?! :)
Comment 71•11 years ago
|
||
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #69)
> Ideally in the next few days -- MWC/1.3/etc. stuff took over unfortunately.
> :/
Re-pinging after MWC.
Comment 72•11 years ago
|
||
(In reply to Chris Cooper [:coop] [away until April 3] from comment #71)
> (In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #69)
> > Ideally in the next few days -- MWC/1.3/etc. stuff took over unfortunately.
> > :/
>
> Re-pinging after MWC.
Re-assigning and re-pinging.
Vlad: can you suggest someone else to follow-up with here if you can't get to this?
Assignee: coop → vladimir
Comment 73•11 years ago
|
||
I took a stab at a patch. There's a pending pull request:
https://github.com/vvuk/winrm/pull/1
Assignee | ||
Comment 74•11 years ago
|
||
Sorry, I actually fixed this a few weeks ago and forgot to push/post :/
The github repo is updated, as well as http://people.mozilla.com/~vladimir/misc/winrm.exe .
Assignee | ||
Comment 76•11 years ago
|
||
I'm not sure this will fix bug 981298 fwiw -- every once in a while windows does get in a weird NTFS state where it thinks something has a lock/is using a directory, when nothing is. I never figured out what was actually causing it and how to fix it, short of a reboot.
Comment 77•11 years ago
|
||
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #76)
> I'm not sure this will fix bug 981298 fwiw -- every once in a while windows
> does get in a weird NTFS state where it thinks something has a lock/is using
> a directory, when nothing is. I never figured out what was actually causing
> it and how to fix it, short of a reboot.
I'll still take the 70% speedup this gives us on file deletions in the pathological cases (3 levels of nested m-c clones), based on my testing today.
I'll start deployment this week and make sure this gets into GPO.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 11 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 78•11 years ago
|
||
Awesome! For bug 981298, that's not caused by this; no idea how things ever get in that state. Should be independent of rm implementations.
Comment 79•10 years ago
|
||
Sigh.
We hit bug 1013705 in the release process, and we're currently working around it with https://bugzilla.mozilla.org/show_bug.cgi?id=1013700#c2.
I've managed to reproduce it in staging too.
command: START
command: rm -rf /c/builds/moz2_slave/rel-m-beta-w32_sa_rpk-00000000/mozilla-beta/obj-l10n/dist/current
command: cwd: c:\builds\moz2_slave\rel-m-beta-w32_sa_rpk-00000000
command: output:
command: END (0.00s elapsed)
command: START
command: mkdir /c/builds/moz2_slave/rel-m-beta-w32_sa_rpk-00000000/mozilla-beta/obj-l10n/dist/current
command: cwd: c:\builds\moz2_slave\rel-m-beta-w32_sa_rpk-00000000
command: output:
command: ERROR
Traceback (most recent call last):
File "c:\builds\moz2_slave\rel-m-beta-w32_sa_rpk-00000000\scripts\lib\python\util\commands.py", line 47, in run_cmd
return subprocess.check_call(cmd, **kwargs)
File "c:\mozilla-build\python27\lib\subprocess.py", line 511, in check_call
raise CalledProcessError(retcode, cmd)
CalledProcessError: Command '['mkdir', '/c/builds/moz2_slave/rel-m-beta-w32_sa_rpk-00000000/mozilla-beta/obj-l10n/dist/current']' returned non-zero exit status 1
command: END (0.01s elapsed)
The following tracebacks were detected during repacks:
fr:
Traceback (most recent call last):
File "c:/builds/moz2_slave/rel-m-beta-w32_sa_rpk-00000000/scripts/scripts/l10n/create-release-repacks.py", line 124, in createRepacks
buildNumber=buildNumber, stageServer=stageServer)
File "c:\builds\moz2_slave\rel-m-beta-w32_sa_rpk-00000000\scripts\lib\python\build\l10n.py", line 183, in repackLocale
run_cmd(['mkdir', current])
File "c:\builds\moz2_slave\rel-m-beta-w32_sa_rpk-00000000\scripts\lib\python\util\commands.py", line 47, in run_cmd
return subprocess.check_call(cmd, **kwargs)
File "c:\mozilla-build\python27\lib\subprocess.py", line 511, in check_call
raise CalledProcessError(retcode, cmd)
CalledProcessError: Command '['mkdir', '/c/builds/moz2_slave/rel-m-beta-w32_sa_rpk-00000000/mozilla-beta/obj-l10n/dist/current']' returned non-zero exit status 1
Full log is here for those with access:
http://dev-master1.srv.releng.scl3.mozilla.com:8044/builders/release-mozilla-beta-win32_standalone_repack/builds/6/steps/run_script/logs/stdio
Logging into the slave, the "current" dir still seems to have all/most of its contents. Not sure why it's returning so quickly here without actually completing the action.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 80•10 years ago
|
||
I've disabled the slave in question if anyone wants to take a look.
Running the same rm command from the login shell (ssh) works as expected, of course.
Comment 81•10 years ago
|
||
Interesting. winrm doesn't even seem to be finding the directory. The corresponding mkdir call is expanding out the path as follows:
Expanded out '/c/builds/moz2_slave/rel-m-beta-w32_sa_rpk-00000000/mozilla-beta/obj-l10n/dist/current' => '\\?\c:\c\builds\moz2_slave\rel-m-beta-w32_sa_rpk-00000000\mozilla-beta\obj-l10n\dist\current'
mkdir: cannot create directory `/c/builds/moz2_slave/rel-m-beta-w32_sa_rpk-00000000/mozilla-beta/obj-l10n/dist/current': File exists
So it seems that winrm is unable to munge the path in the same way in this context.
Comment 82•10 years ago
|
||
(In reply to Chris Cooper [:coop] from comment #81)
> Expanded out
> '/c/builds/moz2_slave/rel-m-beta-w32_sa_rpk-00000000/mozilla-beta/obj-l10n/
> dist/current' =>
> '\\?\c:\c\builds\moz2_slave\rel-m-beta-w32_sa_rpk-00000000\mozilla-beta\obj-
> l10n\dist\current'
> mkdir: cannot create directory
> `/c/builds/moz2_slave/rel-m-beta-w32_sa_rpk-00000000/mozilla-beta/obj-l10n/
> dist/current': File exists
Nevermind. That's actually poorly buffered output from running winrm with the verbose option. So it looks like winrm is doing path translation here when it doesn't need to.
Comment 83•10 years ago
|
||
It's transforming /c/foo in \\?\c:\c\foo (notice the two "c")
Comment 84•10 years ago
|
||
How is winrm built? Is it built with msys gcc? That would explain it: msys doesn't translate msys paths to windows paths when executing msys applications.
Assignee | ||
Comment 85•10 years ago
|
||
It's built with msvc; it doesn't touch or link to msys at all. I noticed the c:\c\ -- that's the problem. Which make/shell is running this? Whatever is executing winrm needs to translate paths to normal windows-style paths; winrm doesn't understand msys/cygwin paths.
I mean, I can *make* it do so, with a command line option maybe.. but our build system is a quagmire of different types of paths, it would be better if we could just understand what's causing wrong paths to be generated.
Comment 86•10 years ago
|
||
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #85)
> I mean, I can *make* it do so, with a command line option maybe.. but our
> build system is a quagmire of different types of paths, it would be better
> if we could just understand what's causing wrong paths to be generated.
The short answer is "l10n," but I'm happy to fix that script over in bug 1013705 instead of this tool.
Status: REOPENED → RESOLVED
Closed: 11 years ago → 10 years ago
Resolution: --- → FIXED
Comment 87•10 years ago
|
||
(In reply to Armen Zambrano [:armenzg] (EDT/UTC-4) from comment #8)
> Perhaps we want to file a bug for MozillaBuild to take this change too?
I filed bug 1025099.
Updated•7 years ago
|
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Updated•5 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•