Closed
Bug 475276
Opened 16 years ago
Closed 15 years ago
Some of l10n mozilla-central directory not removed on clobber, breaks subsequent builds
Categories
(Release Engineering :: General, defect, P2)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: ehsan.akhgari, Assigned: coop)
References
Details
Attachments
(3 files, 4 obsolete files)
830 bytes,
patch
|
catlee
:
review+
nthomas
:
checked-in+
|
Details | Diff | Splinter Review |
4.04 KB,
patch
|
catlee
:
review+
coop
:
checked-in+
|
Details | Diff | Splinter Review |
522 bytes,
patch
|
catlee
:
review+
|
Details | Diff | Splinter Review |
This has happened several times. Example logs: http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-l10n-fa/1232766067.1232766095.8031.gz http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-l10n-fa/1232852472.1232852500.24900.gz http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-l10n-fa/1232895733.1232895749.4148.gz Error message: autoconf: configure.in: No such file or directory
Comment 1•16 years ago
|
||
Looks like we had a glitch with moz2-win32-slave11's mozilla-central checkout, which caused this problem for a bunch of locales. Hard to say what happened because the cleanup script has since removed the working directory, but we were getting this on hg update of m-c abort: repository default not found! I've seen that when the dir for mozilla-central is empty (unknown cause, CC'ing catlee incase the cleanup script has a glitch). I was touching all these machines last night and may have screwed this one up in the process, apologies if so. Should come right on the next scheduled build (1900 PST), please reopen if it happens again.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → WORKSFORME
Reporter | ||
Comment 2•15 years ago
|
||
It's happening again: <http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-l10n-fa/1234580497.1234580532.24646.gz> <http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-l10n-fa/1234537282.1234537292.29301.gz> <http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-l10n-fa/1234494134.1234494152.10702.gz> <http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-l10n-fa/1234364492.1234364503.4805.gz> <http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-l10n-fa/1234105232.1234105239.30511.gz> <http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-l10n-fa/1234062051.1234062065.25337.gz> ...
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Comment 3•15 years ago
|
||
The next set of win32 mozilla-central l10n builds will clobber their working dirs before building. There are at least a couple of slaves with bogus mozilla-central checkouts (01 and 09), mostly empty but a few files and the .hg dir remaining. Please reopen if that doesn't help, or the problem comes back.
Status: REOPENED → RESOLVED
Closed: 16 years ago → 15 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 4•15 years ago
|
||
This has started to happen again several times today <http://tinderbox.mozilla.org/showbuilds.cgi?tree=Mozilla-l10n-fa>: <http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-l10n-fa/1237035608.1237035619.21995.gz> <http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-l10n-fa/1237035599.1237035612.21981.gz> <http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-l10n-fa/1237035590.1237035605.21963.gz> <http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-l10n-fa/1237035571.1237035588.21881.gz>
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 5•15 years ago
|
||
Those four are all on moz2-win32-slave08, but moz2-win32-slave16 is also broken. I've fixed the both slaves up, and will leave this open to figure out the cause. We're ending up with /build/moz2-slave/mozilla-central-win32-l10n-nightly/build/mozilla-central containing (only) these three 3 files ./.hg/store/data/toolkit/crashreporter/google-breakpad/src/processor/testdata/symbols/kernel32.pdb/_b_c_e8785_c57_b44245_a669896_b6_a19_b9542/kernel32.sym.i ./.hg/store/data/toolkit/crashreporter/google-breakpad/src/processor/testdata/symbols/test__app.pdb/5_a9832_e5287241_c1838_e_d98914_e9_b7_f_f1/test__app.sym.d ./.hg/store/data/toolkit/crashreporter/google-breakpad/src/processor/testdata/symbols/test__app.pdb/5_a9832_e5287241_c1838_e_d98914_e9_b7_f_f1/test__app.sym.i Mode is 644, sensible file ownership. Something must be failing to clobber properly but it'll take a bit of digging to find out what.
Summary: Build failure on WINNT 5.2 mozilla-central l10n nightly (fa) → Some of l10n mozilla-central directory not removed on clobber, breaks subsequent builds
Comment 6•15 years ago
|
||
Might be a consequence of the ignoreErrors here http://hg.mozilla.org/build/tools/file/27453eb43283/buildfarm/maintenance/purge_builds.py#l50 catlee, do you recall why you added that option when writing this script in bug 464103 ?
Comment 7•15 years ago
|
||
I know we're investigating if there's some machine weirdness going on. But separate to that, I'm curious - this bug only talks about problems with "fa" locale. If this was all machine-specific, shouldnt we see problems with other locales also processed on this same machine?
Comment 8•15 years ago
|
||
(In reply to comment #7) Did you see the original summary of this bug ? It would affected other locales if they were given to broken slaves but seems unlikely Ehsan will be watching more than his own locale.
Reporter | ||
Comment 9•15 years ago
|
||
(In reply to comment #8) > (In reply to comment #7) > Did you see the original summary of this bug ? It would affected other locales > if they were given to broken slaves but seems unlikely Ehsan will be watching > more than his own locale. That's right, I've never watched any other l10n tinderbox. Comparing <http://tinderbox.mozilla.org/showbuilds.cgi?tree=Mozilla-l10n-fa&maxdate=1237066261&legend=0&norules=1> and <http://tinderbox.mozilla.org/showbuilds.cgi?tree=Mozilla-l10n&maxdate=1237066261&legend=0&norules=1>, I see that there are a lot of similar failures on the same column, and I'm sure that not all of them have been with fa, but I couldn't tell which other locales have been affected, as the build logs don't seem to show any problem.
Comment 10•15 years ago
|
||
(In reply to comment #8) > (In reply to comment #7) > Did you see the original summary of this bug ? Yep, I did. The original summary also talked about "fa" only. > It would affected other locales > if they were given to broken slaves but seems unlikely Ehsan will be watching > more than his own locale. Yes, and I saw your comments about the clobber step not doing the expected step. I understand that we expect this to be breaking other locales, but if this has been happening since 25jan2009, it strikes me as odd that no other locales have noticed/reported it yet. Hence my question and cc-ing Axel.
Reporter | ||
Comment 11•15 years ago
|
||
Trying other locales randomly, I saw these two failures with nl: <http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-l10n-nl/1237052911.1237052955.32256.gz> <http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-l10n-nl/1237041526.1237041621.8058.gz> This should be enough to show that it's not locale specific.
Reporter | ||
Comment 12•15 years ago
|
||
(BTW, I checked three or four locales)
Comment 13•15 years ago
|
||
(In reply to comment #11) > Trying other locales randomly, I saw these two failures with nl: > > <http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-l10n-nl/1237052911.1237052955.32256.gz> > <http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-l10n-nl/1237041526.1237041621.8058.gz> > > This should be enough to show that it's not locale specific. Perfect, thanks Ehsan, exactly the data I was looking for.
Updated•15 years ago
|
Assignee: nobody → ccooper
Assignee | ||
Comment 14•15 years ago
|
||
catlee: any thoughts on comment 6?
Status: REOPENED → ASSIGNED
OS: Windows Vista → Windows Server 2003
Priority: -- → P2
Comment 15•15 years ago
|
||
(In reply to comment #14) > catlee: any thoughts on comment 6? I ignored errors because it seemed better to delete as many files as possible, instead of failing on the first error and possibly turning the build orange/red. Perhaps purge_builds.py should list any files/directories it failed to delete, and then turn the build orange so that we can investigate?
Assignee | ||
Comment 16•15 years ago
|
||
I dig some digging. shutil.rmtree will apparently fail on Windows if it hits any files that are read-only. The Subversion guys actually ran into this, so one of their cleanup scripts sets the entire tree to r/w before running shutil.rmtree. ref: http://svn.collab.net/repos/svn/trunk/tools/backup/hot-backup.py.in
Comment 17•15 years ago
|
||
This reminds me, Buildbot works around this with this code: http://mxr.mozilla.org/mozilla/source/tools/buildbot/buildbot/slave/commands.py#61
Assignee | ||
Comment 18•15 years ago
|
||
Attachment #367646 -
Flags: review?(catlee)
Assignee | ||
Comment 19•15 years ago
|
||
Our import story is sketchy on Windows, so easiest to just add the required function to the script.
Attachment #367646 -
Attachment is obsolete: true
Attachment #367649 -
Flags: review?(catlee)
Attachment #367646 -
Flags: review?(catlee)
Updated•15 years ago
|
Attachment #367649 -
Flags: review?(catlee) → review+
Assignee | ||
Comment 20•15 years ago
|
||
Comment on attachment 367649 [details] [diff] [review] Use rmdirRecursive instead of shutil.rmtree, v2 changeset: 243:0f24104fe0a7
Attachment #367649 -
Flags: checked‑in+ checked‑in+
Assignee | ||
Updated•15 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 15 years ago → 15 years ago
Resolution: --- → FIXED
Comment 21•15 years ago
|
||
Just caught a windows box trying to delete it's current build dir: C:\WINDOWS\system32\cmd.exe /c python tools/buildfarm/maintenance/purge_builds.py -s 7 -n info -n repo_setup -n tag -n source -n updates -n final_verification -n l10n_verification -n macosx_update_verify -n macosx_build -n macosx_repack -n win32_update_verify -n win32_build -n win32_repack -n linux_update_verify -n linux_build -n linux_repack .. in dir e:\builds\moz2_slave\mozilla-central-win32\. (timeout 3600 secs) ... Deleting ..\mozilla-central-win32 Traceback (most recent call last): File "tools/buildfarm/maintenance/purge_builds.py", line 111, in <module> File "tools/buildfarm/maintenance/purge_builds.py", line 82, in purge File "tools/buildfarm/maintenance/purge_builds.py", line 65, in rmdirRecursive WindowsError: [Error 13] The process cannot access the file because it is being used by another process: '..\\mozilla-central-win32' program finished with exit code 1 http://production-master.build.mozilla.org:8010/builders/WINNT%205.2%20mozilla-central%20build/builds/7779/steps/shell_5/logs/stdio Went on to blow up the build by running out of space. Perhaps we should be appending -n <current build> to this.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 22•15 years ago
|
||
And this on linux: python tools/buildfarm/maintenance/purge_builds.py -s 5 -n info -n repo_setup -n tag -n source -n updates -n final_verification -n l10n_verification -n macosx_update_verify -n macosx_build -n macosx_repack -n win32_update_verify -n win32_build -n win32_repack -n linux_update_verify -n linux_build -n linux_repack .. in dir /builds/moz2_slave/mozilla-1.9.1-linux/. (timeout 3600 secs) .... Deleting ../tracemonkey-linux Traceback (most recent call last): File "tools/buildfarm/maintenance/purge_builds.py", line 111, in <module> purge(args[0], options.size, options.skip, options.dry_run) File "tools/buildfarm/maintenance/purge_builds.py", line 82, in purge rmdirRecursive(d) File "tools/buildfarm/maintenance/purge_builds.py", line 61, in rmdirRecursive rmdirRecursive(full_name) File "tools/buildfarm/maintenance/purge_builds.py", line 61, in rmdirRecursive rmdirRecursive(full_name) File "tools/buildfarm/maintenance/purge_builds.py", line 61, in rmdirRecursive rmdirRecursive(full_name) File "tools/buildfarm/maintenance/purge_builds.py", line 61, in rmdirRecursive rmdirRecursive(full_name) File "tools/buildfarm/maintenance/purge_builds.py", line 63, in rmdirRecursive os.chmod(full_name, 0700) OSError: [Errno 2] No such file or directory: '../tracemonkey-linux/build/configs/mozilla2/linux/mozilla-1.9.1' program finished with exit code 1 Probably related to this symlink mozilla-1.9.1@ -> mozilla-central
Updated•15 years ago
|
Attachment #367649 -
Flags: checked‑in+ checked‑in+ → checked‑in- checked‑in-
Comment 23•15 years ago
|
||
Comment on attachment 367649 [details] [diff] [review] Use rmdirRecursive instead of shutil.rmtree, v2 Backed this out for bustage in previous comments. changeset: 244:d8b5cee9aff2
Assignee | ||
Comment 24•15 years ago
|
||
(In reply to comment #22) > Probably related to this symlink mozilla-1.9.1@ -> mozilla-central This we can fix by checking the link before we chmod, e.g.: "if not os.path.lexists(full_name)" (In reply to comment #21) > Went on to blow up the build by running out of space. Perhaps we should be > appending -n <current build> to this. Yeah, we were ignoring errors so this never bit us previously. I'll whip up a new patch shortly.
Assignee | ||
Comment 25•15 years ago
|
||
This fixes up how we handle symlinks so that we don't skip over them at the outset and don't try to chmod them later on. Need a separate buildbotcustom patch to avoid trying to delete the current working dir.
Attachment #367649 -
Attachment is obsolete: true
Attachment #367707 -
Flags: review?(nthomas)
Assignee | ||
Updated•15 years ago
|
Status: REOPENED → ASSIGNED
Assignee | ||
Comment 26•15 years ago
|
||
I've tested the basename command on linux/mac/win32, and ran the build step on staging-master2 (although only against mac) to verify it works correctly.
Attachment #367780 -
Flags: review?(catlee)
Assignee | ||
Comment 27•15 years ago
|
||
Comment on attachment 367707 [details] [diff] [review] Use rmdirRecursive instead of shutil.rmtree, v3 Switching reviewer...no offense intended to nthomas!
Attachment #367707 -
Flags: review?(nthomas) → review?(catlee)
Updated•15 years ago
|
Attachment #367780 -
Flags: review?(catlee) → review+
Assignee | ||
Comment 28•15 years ago
|
||
Comment on attachment 367780 [details] [diff] [review] Set the builddir property, and then add it to the list of dirs to ignore. changeset: 224:c14c4daed3bc
Attachment #367780 -
Flags: checked‑in+ checked‑in+
Updated•15 years ago
|
Attachment #367707 -
Flags: review?(catlee) → review+
Comment 29•15 years ago
|
||
Comment on attachment 367707 [details] [diff] [review] Use rmdirRecursive instead of shutil.rmtree, v3 can we test this on staging first?
Assignee | ||
Comment 30•15 years ago
|
||
(In reply to comment #29) > (From update of attachment 367707 [details] [diff] [review]) > can we test this on staging first? Sure thing. I'll get it running there this aft/tonight.
Assignee | ||
Updated•15 years ago
|
Attachment #367707 -
Flags: checked‑in+ checked‑in+
Assignee | ||
Comment 31•15 years ago
|
||
Comment on attachment 367707 [details] [diff] [review] Use rmdirRecursive instead of shutil.rmtree, v3 changeset: 245:282a97d732d5
Comment 32•15 years ago
|
||
clobberer.py suffers from the same problem. See bug 483943 for details there. Solution sounds like switching clobberer.py to using rmdirRecursive as well.
Assignee | ||
Updated•15 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 15 years ago → 15 years ago
Resolution: --- → FIXED
Comment 33•15 years ago
|
||
Still having problems with some of the files mentioned in comment #5. I speculated that it might be a windows path length problem, but it only seems to add up to 230 characters, which is well below the 255 limit.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 34•15 years ago
|
||
See bug 396187 for a 2^8 length limit we've hit in the past, although we're getting a different WindowsError than what Rob Helmer mentioned in bug 396187 comment #18 ('The system cannot find the path specified' here vs 'the filename or extension is too long').
Comment 35•15 years ago
|
||
Something else is also going on, eg from the m-1.9.1 nightly on moz2-win32-slave06 today: 1, On the nightly build it failed with Deleting ..\mozilla-1.9.1-win32-l10n-nightly ... File "tools/buildfarm/maintenance/purge_builds.py", line 61, in rmdirRecursive os.chmod(full_name, 0600) WindowsError: [Error 2] The system cannot find the path specified: '..\\mozilla-1.9.1-win32-l10n-nightly\\build\\mozilla-1.9.1\\.hg\\store\\data\\toolkit\\crashreporter\\google-breakpad\\src\\processor\\testdata\\symbols\\test__app.pdb\\5_a9832_e5287241_c1838_e_d98914_e9_b7_f_f1\\test__app.sym.d' 2. Then it did a bunch of l10n builds, the first didn't neet to clean up any space so I'm assuming the later ones didn't either. 3. Then we get a mozilla-1.9.1 dep build, Deleting ..\mozilla-1.9.1-win32-l10n-nightly Deleting ..\mozilla-central-win32-unittest File "tools/buildfarm/maintenance/purge_builds.py", line 70, in rmdirRecursive os.rmdir(dir) WindowsError: [Error 41] The directory is not empty: '..\\mozilla-central-win32-unittest\\build\\modules\\freetype2\\src\\tools' Two things of note here a) it's removed the l10n dir it failed at last time, perhaps one of the l10n builds fixed it somehow b) there's a new type of fail removing the unit dir. This slave last did that type of build at Fri Mar 20 17:28:37 2009 (build 2150) and hit errors in most of the test suites. There are no processes hanging round at this point, several hours later than the dir removal was attempted. The next non-l10n build removed had Deleting ..\mozilla-central-win32-unittest 9.26 GB of space available
Comment 36•15 years ago
|
||
Comment on attachment 367780 [details] [diff] [review] Set the builddir property, and then add it to the list of dirs to ignore. Given the continuing problems I'm going to back this out again. Occasionally broken l10n is a better situation than a much wider range of builds running out of space. Hopefully we can get this working reliably in staging.
Attachment #367780 -
Flags: checked‑in+ checked‑in+ → checked‑in- checked‑in-
Comment 37•15 years ago
|
||
Comment on attachment 367780 [details] [diff] [review] Set the builddir property, and then add it to the list of dirs to ignore. Sorry, munged the wrong attachment.
Attachment #367780 -
Flags: checked‑in- checked‑in- → checked‑in+ checked‑in+
Updated•15 years ago
|
Attachment #367707 -
Flags: checked‑in+ checked‑in+ → checked‑in- checked‑in-
Assignee | ||
Updated•15 years ago
|
Status: REOPENED → ASSIGNED
Assignee | ||
Comment 38•15 years ago
|
||
I'll be playing around with a version of purge_builds.py in my user repository (http://hg.mozilla.org/users/coop_mozilla.com/build-tools) in staging this week. Ideally, I'd like to catch a windows slave in the act so I can poke at those .d files and find out why they think they're so special.
Reporter | ||
Comment 39•15 years ago
|
||
This happened today again on fa tinderbox: <http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-l10n-fa/1239892177.1239892205.27410.gz>
Comment 40•15 years ago
|
||
Looks like coop set some clobbers to resolve this problem.
Assignee | ||
Comment 41•15 years ago
|
||
clobberer.py has the same problem (bug 483943). joduinn and I talked today about moving(remaning) the directory prior to a clobber. The problem we're seeing now is that undeleted files hold the whole checkout dir open so our buildsteps try to update a mostly empty dir rather than clone a new copy. Moving the old dir out of the way prior to attempting to delete anything should solve that, even if part of the delete fails. One possible problem I foresee is if a clobber attempts to move a dir when the previous clobber failed so the renamed dir already exists. Do we need a system process looking for failed clobber dirs to clean out? We can always increment the to-be-moved dirname to avoid collisions, but I wouldn't want that to get out of control, i.e. end up with many mostly-empty checkout dirs.
Assignee | ||
Comment 44•15 years ago
|
||
Running with a new patch in staging right now that moves top-level files/directories out of the way prior to deleting them, with appropriate checks to make sure we aren't piling up undeleted files/directories.
Assignee | ||
Comment 45•15 years ago
|
||
purge_builds.py: - re-add rmdirRecursive() - wrap actual purging in a try block Both: - rename top-level files/dir prior to deleting them - check for existing renamed files/dirs prior to deletion and delete them first Has run in staging for a few days now without incident.
Attachment #367707 -
Attachment is obsolete: true
Attachment #382760 -
Flags: review?(catlee)
Assignee | ||
Comment 46•15 years ago
|
||
Same as previous patch, but moves the try block for purge_builds.py inside the loop, and prints any exceptions to stdout so that we still have a fighting chance of freeing the needed space.
Attachment #382760 -
Attachment is obsolete: true
Attachment #382789 -
Flags: review?(catlee)
Attachment #382760 -
Flags: review?(catlee)
Assignee | ||
Comment 47•15 years ago
|
||
(In reply to comment #46) > prints any exceptions to stdout, I of course meant stderr.
Assignee | ||
Comment 48•15 years ago
|
||
(In reply to comment #46) > Created an attachment (id=382789) [details] > Move dirs aside before deleting them, v2 > > Same as previous patch, but moves the try block for purge_builds.py inside the > loop, and prints any exceptions to stdout so that we still have a fighting > chance of freeing the needed space. The updated patch ran successfully in staging last night. No failures in the logs (I checked Linux and Mac too), and there were no renamed dirs left lying around on the Windows slaves.
Updated•15 years ago
|
Attachment #382789 -
Flags: review?(catlee) → review+
Assignee | ||
Comment 49•15 years ago
|
||
Comment on attachment 382789 [details] [diff] [review] Move dirs aside before deleting them, v2 changeset: 296:e145709d107f
Attachment #382789 -
Flags: checked‑in+ checked‑in+
Assignee | ||
Comment 50•15 years ago
|
||
I'll monitor this over the weekend in case we see any issues (like before) in wider testing.
Comment 51•15 years ago
|
||
This is busted, causing clobberer to fail: Removing mail/ Couldn't clobber properly, bailing out. Manually added a raise statement to the offending bit of code to see what was going on: Traceback (most recent call last): File "../../tools/clobberer/clobberer.py", line 171, in <module> do_clobber(options.dryrun, options.skip) File "../../tools/clobberer/clobberer.py", line 90, in do_clobber if d.endswith(clobber_suffix): NameError: global name 'd' is not defined
Comment 52•15 years ago
|
||
Simple fix of an incorrectly named variable.
Attachment #383310 -
Flags: review?(catlee)
Assignee | ||
Comment 53•15 years ago
|
||
Comment on attachment 383310 [details] [diff] [review] [checked in] Fix variable name typo in v2 patch Ugh, cut-n-paste fail.
Updated•15 years ago
|
Attachment #383310 -
Flags: review?(catlee) → review+
Comment 54•15 years ago
|
||
Comment on attachment 383310 [details] [diff] [review] [checked in] Fix variable name typo in v2 patch changeset: 298:01f01389bbf0
Attachment #383310 -
Attachment description: Fix variable name typo in v2 patch → [checked in] Fix variable name typo in v2 patch
Assignee | ||
Comment 55•15 years ago
|
||
I forced a round of clobbers on m-c and both scripts seem to be working now.
Status: ASSIGNED → RESOLVED
Closed: 15 years ago → 15 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•