851270 - Don't clobber the source checkout

Reporter

Description

•

12 years ago

I was poking through build logs and noticed the following: --- Executing: ['python', '/builds/slave/m-cen-osx64-000000000000000000/tools/buildfarm/utils/hgtool.py', '--mirror', 'http://hg-internal.dmz.scl3.mozilla.com/mozilla-central', '--bundle', 'http://ftp.mozilla.org/pub/mozilla.org/firefox/bundles/mozilla-central.hg', 'http://hg.mozilla.org/mozilla-central', 'build'] Checking if share extension works command: START command: hg help share command: cwd: /builds/slave/m-cen-osx64-000000000000000000 command: output: command: END (0.30 elapsed) /builds/slave/m-cen-osx64-000000000000000000/build doesn't appear to be a valid hg directory; clobbering command: START command: hg path default command: cwd: /builds/hg-shared/mozilla-central command: output: http://hg.mozilla.org/mozilla-central command: END (0.20 elapsed) Updating shared repo command: START command: hg path default command: cwd: /builds/hg-shared/mozilla-central command: output: http://hg.mozilla.org/mozilla-central command: END (0.16 elapsed) command: START command: hg pull -r b672877ed04656a9f4041f31ef5d8030df662726 http://hg-internal.dmz.scl3.mozilla.com/mozilla-central command: cwd: /builds/hg-shared/mozilla-central command: output: pulling from http://hg-internal.dmz.scl3.mozilla.com/mozilla-central searching for changes adding changesets adding manifests adding file changes added 5 changesets with 8 changes to 8 files (run 'hg update' to get a working copy) command: END (3.44s elapsed) Trying to share /builds/hg-shared/mozilla-central to /builds/slave/m-cen-osx64-000000000000000000/build command: START command: hg share -U /builds/hg-shared/mozilla-central /builds/slave/m-cen-osx64-000000000000000000/build command: cwd: /builds/slave/m-cen-osx64-000000000000000000 command: output: command: END (0.14s elapsed) command: START command: hg update -C -r b672877ed04656a9f4041f31ef5d8030df662726 command: cwd: /builds/slave/m-cen-osx64-000000000000000000/build command: output: 76788 files updated, 0 files merged, 0 files removed, 0 files unresolved command: END (485.15s elapsed) command: START command: hg parent --template {node|short} command: cwd: /builds/slave/m-cen-osx64-000000000000000000/build command: output: b672877ed046 command: END (0.77 elapsed) Got revision b672877ed046 program finished with exit code 0 elapsedTime=490.712928 --- The interesting output there is: --- command: START command: hg update -C -r b672877ed04656a9f4041f31ef5d8030df662726 command: cwd: /builds/slave/m-cen-osx64-000000000000000000/build command: output: 76788 files updated, 0 files merged, 0 files removed, 0 files unresolved command: END (485.15s elapsed) --- 8:08 to update the Mercurial repository. Wat?! Looking at more logs, it appears this is common. Here's a Windows log: --- command: START command: hg update -C -r d3d9a1552a4d62a7b80c53b5af8ef6fd610b3ae6 command: cwd: e:\builds\moz2_slave\m-cen-w32-00000000000000000000\build command: output: 76788 files updated, 0 files merged, 0 files removed, 0 files unresolved command: END (664.56s elapsed) --- 11 minutes?! It appears we are completely removing the source directory checkout and repopulating it from scratch. Why? Instead of blowing away the entire source directory and performing a full update, we should instead clean out the old source directory and then update it to the revision specified. You could do this with something like: hg st -n -i -u -0 | xargs -0 rm hg up -C -r <rev> This should complete in seconds rather than minutes and thus will make builds start a lot faster.

Ed Morley [:emorley]

Updated

•

12 years ago

Whiteboard: [buildfaster:P1][capacity]

Chris AtLee [:catlee]

Comment 1

•

12 years ago

We're not blowing away the source directory and repopulating it from scratch. This only happens for dep builds if the build directory has been clobbered for some reason. Try and nightly builds are always clobbers. If we're seeing this happen in general in dep builds that haven't been clobbered, then that's definitely a bug.

Gregory Szorc [:gps]

Reporter

Comment 2

•

12 years ago

We should never need to delete the source directory, even for clobber builds. The hg st command I posted plus an up -C is equivalent to a clean checkout and should be much more efficient. I double checked and non-clobber builds on inbound do appear to be sane. However, Try exhibits the bad behavior (presumably since all try builds are clobber builds).

Summary: More intelligent handling of Mercurial updates → Don't clobber the source checkout

Gregory Szorc [:gps]

Reporter

Comment 3

•

12 years ago

Note that the full |hg up| is only part of the problem: the clobber step (rm -rf) can consume many minutes as well. I'm looking at a Try Windows log where rm -rf takes 9 minutes! I reckon this change could make Windows clobber builds start 15+ minutes faster!

Jeff Hammel

Comment 4

•

12 years ago

iirc, there is a faster rm -rf that :jhford (i believe?) wrote for windows; not sure if that is/can be utilized. Though I agree, avoiding altogether is even better

Mike Hommey [:glandium]

Comment 5

•

12 years ago

I think i wrote it in some other bug, but we could leverage file system level snapshot facilities (volume shadow copy, hfs+ snapshots, lvm snapshots/zfs/btrfs) to replace rm -rf.

Gregory Szorc [:gps]

Reporter

Comment 6

•

12 years ago

We could also mount the source checkout as a read-only filesystem. But that would require changes to the build such that we never write files into the source tree. Do we have a tracking bug on that?

Mike Hommey [:glandium]

Comment 7

•

12 years ago

(In reply to Gregory Szorc [:gps] from comment #0) > hg st -n -i -u -0 | xargs -0 rm hg purge --all ?

John Ford [:jhford] CET/CEST Berlin Time

Comment 8

•

12 years ago

I definitely wrote a win32 native program that did rm -rf using native windows calls. It was deployed prematurely and there were a couple issues with it on non-wow64 systems, iirc. I don't know how much interest there still is in it and I'm not sure where a good version of the code ended up.

Ed Morley [:emorley]

Comment 9

•

12 years ago

Bug 727551 / http://hg.mozilla.org/users/jford_mozilla.com/jhford-native-rm/ afaict :-)

Gregory Szorc [:gps]

Reporter

Comment 10

•

12 years ago

Attached patch Part 1: Add purge ability to hgtool, v1 (obsolete) — Details — Splinter Review

I gave hgtool the ability to purge. When you pass --purge to hgtool, it will |hg purge -a --all| before |hg up|. This is effectively a clobber without actually blowing away the source dir. It should remove all the object directories and any temporary files and directories created in the source directory. If hg purge fails, it will exit with a non-0 status code. This should be caught be existing except blocks and trigger a full rm -rf style clobber. I have not tested this patch. I have no clue how to test it. I'd love it if someone else took this patch and ran it to the finish line.

Attachment #725229 - Flags: review?(bhearsum)

Gregory Szorc [:gps]

Reporter

Comment 11

•

12 years ago

Attached patch Part 2: Purge, don't clobber, try builds, v1 (obsolete) — Details — Splinter Review

I removed the rm -rf step from Try builds and added --purge to the hgtool invocation. The net effect should be the same. Again, I haven't tested this and have no clue how to test this. There are other parts in this file that need the clobber logic updated. I figured Try would be a good place to start.

Attachment #725230 - Flags: review?(bhearsum)

(not currently active) Ted Mielczarek

Comment 12

•

12 years ago

Note that hg purge is an extension that needs to be enabled.

Gregory Szorc [:gps]

Reporter

Comment 13

•

12 years ago

(In reply to Ted Mielczarek [:ted.mielczarek] from comment #12) > Note that hg purge is an extension that needs to be enabled. Ahh, good point. Presumably there is a .hgrc file lingering in one of these repositories? If not, we can always emulate hg purge with hg st | rm (see my initial comment).

Mike Hommey [:glandium]

Comment 14

•

12 years ago

.hgrc doesn't need to be in the repository. There are other places mercurial looks for one.

Gregory Szorc [:gps]

Reporter

Comment 15

•

12 years ago

(In reply to Mike Hommey [:glandium] from comment #14) > .hgrc doesn't need to be in the repository. There are other places mercurial > looks for one. I know how hgrc works :) What I meant was I'm guessing there is an hgrc file in build-tools or buildbotcustom or some other repo that is automagically installed as part of running jobs.

Chris AtLee [:catlee]

Comment 16

•

12 years ago

Our hgrc files are centrally managed by puppet (on linux and OSX), so as long as the version of hg we have on there supports purge, it's easy to enable. hgtool could test if the purge extension is enabled, and use some fallback code if it doesn't.

Ed Morley [:emorley]

Comment 17

•

12 years ago

Afaict purge has been a bundled extension since forever: http://selenic.com/repo/hg/log/a07be8953733/hgext/purge.py (hg v1.0 was released in 2008, so after it was moved into hgext/) ...so we shouldn't even need the fallback :-)

Gregory Szorc [:gps]

Reporter

Comment 18

•

12 years ago

I think that since we control the entire stack and given the performance repercussions if purge is not used, we should explicitly require purge. If nothing else, supporting N - 1 configurations is easier and less error prone than N. KISS.

bhearsum@mozilla.com (:bhearsum)

Comment 19

•

12 years ago

Comment on attachment 725229 [details] [diff] [review] Part 1: Add purge ability to hgtool, v1 Review of attachment 725229 [details] [diff] [review]: ----------------------------------------------------------------- (In reply to Gregory Szorc [:gps] from comment #2) > We should never need to delete the source directory, even for clobber > builds. The hg st command I posted plus an up -C is equivalent to a clean > checkout and should be much more efficient. > > I double checked and non-clobber builds on inbound do appear to be sane. > However, > Try exhibits the bad behavior (presumably since all try builds are clobber > builds). Just wanted to note that we will *still* see full source clobbers due to the ensure-we-have-enough-disk-space script that runs at the start of most jobs. This is still great to have though, especially on try! (In reply to Gregory Szorc [:gps] from comment #18) > I think that since we control the entire stack and given the performance > repercussions if purge is not used, we should explicitly require purge. If > nothing else, supporting N - 1 configurations is easier and less error prone > than N. KISS. I know what you're saying here, but I respectfully disagree. Even though we control the whole stack, we still don't want to make the assumption that "purge" is always available. Changes to puppet, addition of new platforms, etc. all can cause us to lose our hgrc and end up with "purge" being unavailable. We need to be able to fallback gracefully to reduce the possibility of unnecessary tree burning. You can use the "hg share" logic as a model to test for purge: https://github.com/mozilla/build-tools/blob/master/lib/python/util/hg.py#L345 With that done, I think we should just drop the --purge option from hgtool altogether and make it the default and only fall back to deleting the entire directory if purge is unavailable or doesn't work for some reason. ::: lib/python/util/hg.py @@ +98,5 @@ > ver = (0, 0, 0) > log.debug("Running hg version %s", ver) > return ver > > +def do_purge(dest): Naming nit: drop the "do_" to match the style of the rest of this file. Also, please add a test or two (https://github.com/mozilla/build-tools/blob/master/lib/python/mozilla_buildtools/test/test_util_hg.py).

Attachment #725229 - Flags: review?(bhearsum) → review-

bhearsum@mozilla.com (:bhearsum)

Comment 20

•

12 years ago

Comment on attachment 725230 [details] [diff] [review] Part 2: Purge, don't clobber, try builds, v1 This patch won't be necessary, per my previous comment about making purging the normal behaviour.

Attachment #725230 - Attachment is obsolete: true

Attachment #725230 - Flags: review?(bhearsum)

Ed Morley [:emorley]

Comment 21

•

12 years ago

(In reply to Ben Hearsum [:bhearsum] from comment #19) > I know what you're saying here, but I respectfully disagree. Even though we > control the whole stack, we still don't want to make the assumption that > "purge" is always available. Changes to puppet, addition of new platforms, > etc. all can cause us to lose our hgrc and end up with "purge" being > unavailable. We need to be able to fallback gracefully to reduce the > possibility of unnecessary tree burning. The only downside is that we potentially forget to update an hgrc and go 6 months without realising we are doing the fallback method and wasting a load of builder time. I'd rather this be noisy is possible.

bhearsum@mozilla.com (:bhearsum)

Comment 22

•

12 years ago

(In reply to Ed Morley [:edmorley UTC+0] from comment #21) > (In reply to Ben Hearsum [:bhearsum] from comment #19) > > I know what you're saying here, but I respectfully disagree. Even though we > > control the whole stack, we still don't want to make the assumption that > > "purge" is always available. Changes to puppet, addition of new platforms, > > etc. all can cause us to lose our hgrc and end up with "purge" being > > unavailable. We need to be able to fallback gracefully to reduce the > > possibility of unnecessary tree burning. > > The only downside is that we potentially forget to update an hgrc and go 6 > months without realising we are doing the fallback method and wasting a load > of builder time. I'd rather this be noisy is possible. Can we throw a warning that TBPL or some other tool can pick up?

Ed Morley [:emorley]

Comment 23

•

12 years ago

We could either turn the job orange (but not burn the build), or else keep it green, but notify in the tinderbox print area (similar to how we do for the "this was a clobber build" message). ie output to the log: TinderboxPrint:hg purge extension not found! Using inefficient srcdir clobber! etc

bhearsum@mozilla.com (:bhearsum)

Comment 24

•

12 years ago

(In reply to Ed Morley [:edmorley UTC+0] from comment #23) > We could either turn the job orange (but not burn the build), or else keep > it green, but notify in the tinderbox print area (similar to how we do for > the "this was a clobber build" message). ie output to the log: > > TinderboxPrint:hg purge extension not found! Using inefficient srcdir > clobber! My goal here is not to close the tree if we screw up an hgrc. If orange because of this would do it let's do the latter. If the orange wouldn't be a big deal, either sounds fine to me.

Ed Morley [:emorley]

Comment 25

•

12 years ago

Orange on a build or two on a newly added platform/tree sounds manageable - we'll just star as we do for any other failure until purge has been added to the hgrc :-)

bhearsum@mozilla.com (:bhearsum)

Updated

•

12 years ago

Depends on: 852170

bhearsum@mozilla.com (:bhearsum)

Comment 26

•

12 years ago

Attached patch look for inefficient clobbers (obsolete) — Details — Splinter Review

(In reply to Ed Morley [:edmorley UTC+0] from comment #25) > Orange on a build or two on a newly added platform/tree sounds manageable - > we'll just star as we do for any other failure until purge has been added to > the hgrc :-) Okay, this patch adds an error regex for the string you suggested, and will turn the build orange if its found. We just need to get hgtool to print this to make it work.

Attachment #726201 - Flags: review?(emorley)

Gregory Szorc [:gps]

Reporter

Comment 27

•

12 years ago

(In reply to Ben Hearsum [:bhearsum] from comment #19) > (In reply to Gregory Szorc [:gps] from comment #18) > > I think that since we control the entire stack and given the performance > > repercussions if purge is not used, we should explicitly require purge. If > > nothing else, supporting N - 1 configurations is easier and less error prone > > than N. KISS. > > I know what you're saying here, but I respectfully disagree. Even though we > control the whole stack, we still don't want to make the assumption that > "purge" is always available. Changes to puppet, addition of new platforms, > etc. all can cause us to lose our hgrc and end up with "purge" being > unavailable. We need to be able to fallback gracefully to reduce the > possibility of unnecessary tree burning. You can use the "hg share" logic as > a model to test for purge: > https://github.com/mozilla/build-tools/blob/master/lib/python/util/hg.py#L345 I don't fully understand the complexities and requirements of the buildbot infrastructure, but I still think we should require purge. If a bad hgrc is rolled out, then the recently deployed system configuration is bad and should be reverted. We would have the system qualify/test builds instead of leaving it up to humans to fix warnings. What's wrong with this strictness? > With that done, I think we should just drop the --purge option from hgtool > altogether and make it the default and only fall back to deleting the entire > directory if purge is unavailable or doesn't work for some reason. It's not that simple. Before, "clobber" and "update" were two separate steps in buildbot land. Now, they are one. That step needs to distinguish between "clobber and update" and "just update." --purge provides that distinction. In the ideal world, I'd say we should purge on every build just because we're not supposed to be putting things in the srcdir. However, we can't simply do this because the objdir is a child directory of the srcdir and the purge would remove the objdir. In my ideal world, we should move the objdir so it isn't a child of the srcdir. Then we can more easily clobber one or the other. We can also do things like mount the srcdir on a read-only filesystem to enforce the "no writes in srcdir are allowed" ideal. We /could/ exclude the objdir from the purges. However, we would need to pass a pattern to purge telling it what directories are object directories. Do they always begin with obj-*? Can you guarantee this for all time on all platforms? That assumption seems rather fragile to me. Even if we do that, we still need a way to determine whether we're clobbering the world. I guess what I'm trying to say is I don't see a way to achieve the optimal end state without introducing a new command argument. > ::: lib/python/util/hg.py > @@ +98,5 @@ > > ver = (0, 0, 0) > > log.debug("Running hg version %s", ver) > > return ver > > > > +def do_purge(dest): > Naming nit: drop the "do_" to match the style of the rest of this file. > Also, please add a test or two > (https://github.com/mozilla/build-tools/blob/master/lib/python/ > mozilla_buildtools/test/test_util_hg.py). Yeah, I had to do this because "purge" was a function argument and would have overridden the global function name. Finally, fixing this issue is beyond my regular sphere of responsibility. I only submitted the patches because I was poking around to see what was going on and realized the fix seemed trivial. Since Ben and others have opinions here and are more familiar with the systems, I'd appreciate if someone else carried this bug to completion.

Aki Sasaki (not active)

Comment 28

•

12 years ago

I think you can enable purge, even if the .hgrc is bad or missing, by hg --config extensions.purge= purge

bhearsum@mozilla.com (:bhearsum)

Comment 29

•

12 years ago

(In reply to Gregory Szorc [:gps] from comment #27) > I don't fully understand the complexities and requirements of the buildbot > infrastructure, but I still think we should require purge. If a bad hgrc is > rolled out, then the recently deployed system configuration is bad and > should be reverted. We would have the system qualify/test builds instead of > leaving it up to humans to fix warnings. What's wrong with this strictness? The problem is that if a bad update is rolled out, we could be burning jobs for hours or longer (and thus, holding the tree closed), before it's noticed and fixed. I agree 100% that we should be using purge everywhere, I just don't think we should be enforcing that by unnecessarily burning jobs. > > With that done, I think we should just drop the --purge option from hgtool > > altogether and make it the default and only fall back to deleting the entire > > directory if purge is unavailable or doesn't work for some reason. > > It's not that simple. > > Before, "clobber" and "update" were two separate steps in buildbot land. > Now, they are one. That step needs to distinguish between "clobber and > update" and "just update." --purge provides that distinction. We might just be having a difference in terminology here...the commands from comment #0 are being run from hgtool.py or one of the libraries it calls - not Buildbot. We can modify hgtool/mercurial() to do whatever we want. In this case I think we want to: - Try to purge if the directory exists -- If purge fails, rm -rf and repopulate the directory (using the existing logic that does that). -- If purge succeeds, pull/update (using the existing logic that does that). > In the ideal world, I'd say we should purge on every build just because > we're not supposed to be putting things in the srcdir. However, we can't > simply do this because the objdir is a child directory of the srcdir and the > purge would remove the objdir. In my ideal world, we should move the objdir > so it isn't a child of the srcdir.Then we can more easily clobber one or > the other. I filed bug 852199 on moving the objdirs. > We can also do things like mount the srcdir on a read-only > filesystem to enforce the "no writes in srcdir are allowed" ideal. Interesting idea, not sure how well it works for platforms other than Linux. Feel free to open a bug on it, though. > We > /could/ exclude the objdir from the purges. However, we would need to pass a > pattern to purge telling it what directories are object directories. Do they > always begin with obj-*? > > I guess what I'm trying to say is I don't see a way to achieve the optimal > end state without introducing a new command argument. An alternative here would be to write out an .hgignore before running hgtool.py. > > ::: lib/python/util/hg.py > > @@ +98,5 @@ > > > ver = (0, 0, 0) > > > log.debug("Running hg version %s", ver) > > > return ver > > > > > > +def do_purge(dest): > > > Naming nit: drop the "do_" to match the style of the rest of this file. > > Also, please add a test or two > > (https://github.com/mozilla/build-tools/blob/master/lib/python/ > > mozilla_buildtools/test/test_util_hg.py). > > Yeah, I had to do this because "purge" was a function argument and would > have overridden the global function name. Ah, OK. > Finally, fixing this issue is beyond my regular sphere of responsibility. I > only submitted the patches because I was poking around to see what was going > on and realized the fix seemed trivial. Since Ben and others have opinions > here and are more familiar with the systems, I'd appreciate if someone else > carried this bug to completion. OK, I'll try to find an owner for this. Thanks for your initial work.

bhearsum@mozilla.com (:bhearsum)

Comment 30

•

12 years ago

(In reply to Aki Sasaki [:aki] from comment #28) > I think you can enable purge, even if the .hgrc is bad or missing, by > > hg --config extensions.purge= purge In fact....we already do this for mq: https://github.com/mozilla/build-tools/blob/master/lib/python/util/hg.py#L486 So we can just do that here and not worry about bug 852170.

Jeff Hammel

Comment 31

•

12 years ago

(In reply to Ben Hearsum [:bhearsum] from comment #29) > (In reply to Gregory Szorc [:gps] from comment #27) > > I don't fully understand the complexities and requirements of the buildbot > > infrastructure, but I still think we should require purge. If a bad hgrc is > > rolled out, then the recently deployed system configuration is bad and > > should be reverted. We would have the system qualify/test builds instead of > > leaving it up to humans to fix warnings. What's wrong with this strictness? > > The problem is that if a bad update is rolled out, we could be burning jobs > for hours or longer (and thus, holding the tree closed), before it's noticed > and fixed. I agree 100% that we should be using purge everywhere, I just > don't think we should be enforcing that by unnecessarily burning jobs. <and other comments> From a onlooker point of view, it seems like what *ideally* we would have would be a way of introspecting and noting "build"/machine issues separately from burning jobs. Instead, orange v. no orange on the basis of, here, "is purgeable?" (+ tinderboxprint) is being focused on since that's all we can trivially do with our current infrastructure. Not saying that a deeper system can/should be done/specced now/ever, but I think its worth noting when a solution is a quick measure v something in good form

Ed Morley [:emorley]

Comment 32

•

12 years ago

(In reply to Ben Hearsum [:bhearsum] from comment #29) > > We > > /could/ exclude the objdir from the purges. However, we would need to pass a > > An alternative here would be to write out an .hgignore before running > hgtool.py. .hgignore already excludes ^obj :-)

Gregory Szorc [:gps]

Reporter

Comment 33

•

12 years ago

We want to purge ignored files from the source dir, especially on Try. We sometimes don't want to purge the objdir.

Ed Morley [:emorley]

Comment 34

•

12 years ago

Comment on attachment 726201 [details] [diff] [review] look for inefficient clobbers This patch is no longer necessary if we go with the solution in comment 30.

Attachment #726201 - Attachment is obsolete: true

Attachment #726201 - Flags: review?(emorley)

bhearsum@mozilla.com (:bhearsum)

Comment 35

•

12 years ago

(In reply to Gregory Szorc [:gps] from comment #2) > We should never need to delete the source directory, even for clobber > builds. The hg st command I posted plus an up -C is equivalent to a clean > checkout and should be much more efficient. This comment slipped by me earlier. I just wanted to note that the reason we clobber things sometimes is disk space. We have one build directory per job (where a job is as specific as "mozilla-central opt build" or "mozilla-inbound b2g desktop". Because we have hundreds of different jobs, we can't keep src/objdirs forever. We make up for this a bit by favouring slaves that have most recently done a build of the same type when choosing one for a pending job. Having re-read the first part of this bug I'm actually confused about what running purge is buying us, except in the extreme cases where we can't update the existing checkout (eg, try repo reset).

Jeff Hammel

Comment 36

•

12 years ago

(In reply to Jeff Hammel [:jhammel] from comment #31) <snip/> > From a onlooker point of view, it seems like what *ideally* we would have > would be a way of introspecting and noting "build"/machine issues separately > from burning jobs. Instead, orange v. no orange on the basis of, here, "is > purgeable?" (+ tinderboxprint) is being focused on since that's all we can > trivially do with our current infrastructure. Not saying that a deeper > system can/should be done/specced now/ever, but I think its worth noting > when a solution is a quick measure v something in good form Will file this as requested. Meant to note more explicitly that this is completely out of scope for this bug

Gregory Szorc [:gps]

Reporter

Comment 37

•

12 years ago

Purges saves you from having to recreate all the files in the srcdir. For any 2 builds, the work required to switch from rev A to B will likely be much smaller than the work required to perform a fresh checkout. The purge is to ensure any remnants from a previous build are gone, thus treating things like a clobber.

bhearsum@mozilla.com (:bhearsum)

Comment 38

•

12 years ago

(In reply to Gregory Szorc [:gps] from comment #37) > Purges saves you from having to recreate all the files in the srcdir. For > any 2 builds, the work required to switch from rev A to B will likely be > much smaller than the work required to perform a fresh checkout. The purge > is to ensure any remnants from a previous build are gone, thus treating > things like a clobber. Ah. So, it sounds like this would be primarily useful for nightly builds, where we always start with a completely fresh repository. We already don't clobber dep builds as part of the build process (but they do sometimes get clobbered to free up space, as noted earlier). It would be a correctness fix for those builds still.

Ed Morley [:emorley]

Comment 39

•

12 years ago

(In reply to Ben Hearsum [:bhearsum] from comment #38) > Ah. So, it sounds like this would be primarily useful for nightly builds And Try surely?

Gregory Szorc [:gps]

Reporter

Comment 40

•

12 years ago

It's 2013: why are we talking about not enough free disk space?

Jeff Hammel

Comment 41

•

12 years ago

(In reply to Jeff Hammel [:jhammel] from comment #36) > (In reply to Jeff Hammel [:jhammel] from comment #31) > <snip/> > > From a onlooker point of view, it seems like what *ideally* we would have > > would be a way of introspecting and noting "build"/machine issues separately > > from burning jobs. Instead, orange v. no orange on the basis of, here, "is > > purgeable?" (+ tinderboxprint) is being focused on since that's all we can > > trivially do with our current infrastructure. Not saying that a deeper > > system can/should be done/specced now/ever, but I think its worth noting > > when a solution is a quick measure v something in good form > > Will file this as requested. Meant to note more explicitly that this is > completely out of scope for this bug bug 852357

Gregory Szorc [:gps]

Reporter

Comment 42

•

12 years ago

According to http://brasstacks.mozilla.com/gofaster/#/overhead/build our win32 builders spend ~1 hour in "build setup and teardown." http://brasstacks.mozilla.com/gofaster/#/executiontime/build shows that our total win32 build time is ~2 hours (I'm pretty sure this doesn't include "build setup and teardown"). Putting the two together, ~33% of the build time for win32 builders is spent in setup and teardown. Or, as I'll call it "buildbot overhead." And, from my earlier comments, we are losing 15+ minutes (possibly even 20+) to the issues described in this bug. Assuming its 15 minutes exactly and assuming the 1 and 2 hour numbers above are exact, build times would go from 180 to 165 minutes. That's a 9% increase in builder capacity.

Nick Thomas [:nthomas] (UTC+12)

Comment 43

•

12 years ago

I think we need to be a bit careful with that overhead number. Everything that doesn't match this config http://mxr.mozilla.org/build/source/braindump/reports/buildfaster_report.py#243 counts as setup/tear down in buildfaster. Note that it's not including long-running things like 'make check'. Taking a (statistically insignificant) sample of two builds: * total time: 135min, previously clobbered and took 10mins to get a working copy of the source from the hg share already on disk, 52min compile, 8 mins building symbols and packaging, 58min make check * total time: 117 min, true dep build with 75 sec hg update, 41 min compile, 8 min packaging, 57 min make check For the first example, the clobbering would have taken a few minutes on some other job. Even with that, 'buildbot overhead' is a lot less than a third of the total time.

Massimo Gervasini [:massimo]

Assignee

Updated

•

12 years ago

Assignee: nobody → mgervasini

John Hopkins (:jhopkins)

Comment 44

•

12 years ago

Worth noting that very long paths on Windows cannot be deleted by mercurial: C:\Users\jhopkins\tmp\buildbot-configs>hg purge --all 0xxxxxxxxxxxxxxxxxxxx\1xxxxxxxxxxxxxxxxxxxx\2xxxxxxxxxxxxxxxxxxxx\3xxxxxxxxxxxxx xxxxxxx\4xxxxxxxxxxxxxxxxxxxx\5xxxxxxxxxxxxxxxxxxxx\6xxxxxxxxxxxxxxxxxxxx\7xxxxx xxxxxxxxxxxxxxx\8xxxxxxxxxxxxxxxxxxxx\9xxxxxxxxxxxxxxxxxxxx: The system cannot f ind the path specified abort: The system cannot find the path specified: 'C:\Users\jhopkins\tmp\buildbo t-configs\0xxxxxxxxxxxxxxxxxxxx/1xxxxxxxxxxxxxxxxxxxx/2xxxxxxxxxxxxxxxxxxxx/3xxx xxxxxxxxxxxxxxxxx/4xxxxxxxxxxxxxxxxxxxx/5xxxxxxxxxxxxxxxxxxxx/6xxxxxxxxxxxxxxxxx xxx/7xxxxxxxxxxxxxxxxxxxx/8xxxxxxxxxxxxxxxxxxxx/9xxxxxxxxxxxxxxxxxxxx/*.*' catlee pointed out that we can fall back to a clobber when that happens.

Chris AtLee [:catlee]

Comment 45

•

12 years ago

We should add an explicit test for this in hgtool's test suite.

Massimo Gervasini [:massimo]

Assignee

Comment 46

•

12 years ago

Attached patch based on gps patch: renamed do_purge, added tests (obsolete) — Details — Splinter Review

Attachment #725229 - Attachment is obsolete: true

Attachment #738505 - Flags: feedback?(catlee)

Massimo Gervasini [:massimo]

Assignee

Comment 47

•

12 years ago

Attached patch enabling hg --purge on try branch — Details — Splinter Review

Enabling purge on try branch

Attachment #738509 - Flags: feedback?(catlee)

Ed Morley [:emorley]

Updated

•

12 years ago

Attachment #738509 - Attachment is patch: true

Attachment #738509 - Attachment mime type: text/x-patch → text/plain

Chris AtLee [:catlee]

Comment 48

•

12 years ago

Comment on attachment 738509 [details] [diff] [review] enabling hg --purge on try branch Review of attachment 738509 [details] [diff] [review]: ----------------------------------------------------------------- Looks good! Can you add this to your ~/.hgrc: [diff] git=1 showfunc = 1 Having the class/function names is really helpful for reading the diffs.

Attachment #738509 - Flags: feedback?(catlee) → feedback+

Chris AtLee [:catlee]

Comment 49

•

12 years ago

Comment on attachment 738505 [details] [diff] [review] based on gps patch: renamed do_purge, added tests Review of attachment 738505 [details] [diff] [review]: ----------------------------------------------------------------- looks good! Ideally we should handle those bad long filenames on windows. We definitely need to re-raise the exception from inside purge() so that the calling functions can remove dest. ::: lib/python/mozilla_buildtools/test/test_util_hg.py @@ +325,5 @@ > + open(fileToModify, 'w').write('just a test') > + purge(self.wc) > + content = open(fileToModify).read() > + self.assertEqual(content, 'just a test') > + can we add a test for purging really long filenames? ::: lib/python/util/hg.py @@ +105,5 @@ > + run_cmd(['hg', '--config', 'extensions.purge=', 'purge', '-a', '--all'], cwd=dest) > + except subprocess.CalledProcessError: > + # purge failed > + # https://bugzilla.mozilla.org/show_bug.cgi?id=851270#c44 > + log.error('purge failed (%s)' % dest) Ok, I think probably the best thing to do is to re-raise the exception here. Both calls below are inside try/except clauses which call remove_dest after the exception is caught. remove_dest should ideally be adjusted with jhopkins' logic from clobberer @@ +390,4 @@ > remove_path(dest) > elif not os.path.exists(os.path.join(dest, ".hg", "sharedpath")): > try: > + purge(dest) should this have an 'if autoPurge:' guard?

Attachment #738505 - Flags: feedback?(catlee) → feedback-

John Hopkins (:jhopkins)

Comment 50

•

12 years ago

FYI, Aki ported my Windows clobber routine to Mozharness and added unit tests: https://hg.mozilla.org/build/mozharness/rev/0308cdea2f32

Massimo Gervasini [:massimo]

Assignee

Comment 51

•

12 years ago

Attached patch better handling for windows long paths, added tests (obsolete) — Details — Splinter Review

changes in this patch: * added missing 'if autoPurge' check * ported mozharness _rmtree_windows() in commands.py * added long path test * remove exception handling when purge fails (just log and re-raise) * added tox.ini for testing

Attachment #738505 - Attachment is obsolete: true

Attachment #741471 - Flags: feedback?(catlee)

Massimo Gervasini [:massimo]

Assignee

Comment 52

•

12 years ago

Attached patch updated _rmtree_windows(), added cwd in purge command (obsolete) — Details — Splinter Review

* ported fixes made in mozharness _rmtree_windows(). * added cwd=os.path.normpath(os.path.join(dest, '..')) in purge command

Attachment #741471 - Attachment is obsolete: true

Attachment #741471 - Flags: feedback?(catlee)

Attachment #742301 - Flags: feedback?(catlee)

Chris AtLee [:catlee]

Updated

•

12 years ago

Attachment #742301 - Flags: feedback?(catlee) → feedback+

:Ms2ger (he/him; ⌚ UTC+1/+2)

Comment 53

•

12 years ago

Comment on attachment 742301 [details] [diff] [review] updated _rmtree_windows(), added cwd in purge command Review of attachment 742301 [details] [diff] [review]: ----------------------------------------------------------------- ::: lib/python/mozilla_buildtools/test/test_util_hg.py @@ +309,5 @@ > + > + def testPurgeUntrackedDirectory(self): > + rev = clone(self.repodir, self.wc, update_dest=False) > + self.assertEquals(rev, None) > + directoryToPurge = os.path.join(self.wc, 'directoyTopurge') Was this meant to say 'directoy'? ::: lib/python/util/commands.py @@ +1,1 @@ > +# ***** BEGIN LICENSE BLOCK ***** "***** BEGIN LICENSE BLOCK *****" is a MPL1.1-ism, and is no longer encouraged. ::: lib/python/util/hg.py @@ +195,5 @@ > # We need to make sure our paths are correct though > if os.path.exists(os.path.join(dest, '.hg')): > adjust_paths(dest, default=repo) > + return mercurial(repo, dest, branch, revision, autoPurge=True, > + update_dest=update_dest, clone_by_rev=clone_by_rev,) Why the extra comma?

Ed Morley [:emorley]

Comment 54

•

12 years ago

(In reply to Ben Hearsum [:bhearsum] from comment #35) > (In reply to Gregory Szorc [:gps] from comment #2) > > We should never need to delete the source directory, even for clobber > > builds. > > This comment slipped by me earlier. I just wanted to note that the reason we > clobber things sometimes is disk space. We have one build directory per job > (where a job is as specific as "mozilla-central opt build" or > "mozilla-inbound b2g desktop". Because we have hundreds of different jobs, > we can't keep src/objdirs forever. I understand that we can't keep objdirs forever (too many permutations), however there's no reason to need to delete the srcdir checkouts, since there should only need to be one per tree & they are smaller than the objdirs. Or do we create a new checkout for each of opt/debug/...? (In which case that's something we should fix asap). (In reply to Nick Thomas [:nthomas] from comment #43) > I think we need to be a bit careful with that overhead number. Everything > that doesn't match this config > > http://mxr.mozilla.org/build/source/braindump/reports/buildfaster_report. > py#243 > counts as setup/tear down in buildfaster. I'm sitting watching an inprogress m-c linux nightly retrigger where I've had to force a clobber on multiple platforms using the clobberer (something that happens very often due to our sucky build dependencies), and it's spent 16 minutes in the checking_clobber_times step (removing old srcdirs and objdirs) & another 10 mins in the hg_update step (checking mozilla-central back out again for the lolz), according to the buildbot master summary. Whilst I follow that in the specific examples below the overhead may have been overstated, this bug still is impacting us significantly.

Chris AtLee [:catlee]

Comment 55

•

12 years ago

(In reply to Ed Morley [:edmorley UTC+1] from comment #54) > (In reply to Ben Hearsum [:bhearsum] from comment #35) > > (In reply to Gregory Szorc [:gps] from comment #2) > > > We should never need to delete the source directory, even for clobber > > > builds. > > > > This comment slipped by me earlier. I just wanted to note that the reason we > > clobber things sometimes is disk space. We have one build directory per job > > (where a job is as specific as "mozilla-central opt build" or > > "mozilla-inbound b2g desktop". Because we have hundreds of different jobs, > > we can't keep src/objdirs forever. > > I understand that we can't keep objdirs forever (too many permutations), > however there's no reason to need to delete the srcdir checkouts, since > there should only need to be one per tree & they are smaller than the > objdirs. Or do we create a new checkout for each of opt/debug/...? (In which > case that's something we should fix asap). Yes, each type of build gets its own source checkout. Changing this would be a significant undertaking I think, since most of the build scripts assume they're operating out of the root of a source checkout. > (In reply to Nick Thomas [:nthomas] from comment #43) > > I think we need to be a bit careful with that overhead number. Everything > > that doesn't match this config > > > > http://mxr.mozilla.org/build/source/braindump/reports/buildfaster_report. > > py#243 > > counts as setup/tear down in buildfaster. > > I'm sitting watching an inprogress m-c linux nightly retrigger where I've > had to force a clobber on multiple platforms using the clobberer (something > that happens very often due to our sucky build dependencies), and it's spent > 16 minutes in the checking_clobber_times step (removing old srcdirs and > objdirs) Ideally we do this before starting buildbot so it's not in the critical path of doing builds (bug 699195) > & another 10 mins in the hg_update step (checking mozilla-central > back out again for the lolz), according to the buildbot master summary. This machine didn't have a local copy of m-c for whatever reason. It most likely got deleted to free up space for another build. The code that does that is here: http://hg.mozilla.org/build/tools/file/default/buildfarm/maintenance/purge_builds.py#l242 > Whilst I follow that in the specific examples below the overhead may have > been overstated, this bug still is impacting us significantly. Here are some numbers for the past week. I've looked at the time taken for 'hg_update' steps. number of checkouts: 10756 mean time: 171s 50th percentile time:120s 90th precentile time:452s

Chris AtLee [:catlee]

Comment 56

•

12 years ago

I ran the same numbers for 'clean_old_builds' to see if it's a significant problem: n: 39062 mean: 9.5s 50th: 1s 90th: 7s

Chris AtLee [:catlee]

Comment 57

•

12 years ago

and 'checking_clobber_times': n 17910 mean 42.3s 50th 1s 90th 139s

Massimo Gervasini [:massimo]

Assignee

Comment 58

•

12 years ago

Attached patch added fix for undefined PYWIN32, removed license, fixed typo, minor fixes — Details — Splinter Review

In this patch: * removed license headers * now PYWIN32 is always defined in commands.py * minor fixes

Attachment #742301 - Attachment is obsolete: true

Attachment #745320 - Flags: review?(catlee)

Chris AtLee [:catlee]

Updated

•

12 years ago

Attachment #745320 - Flags: review?(catlee) → review+

Massimo Gervasini [:massimo]

Assignee

Updated

•

12 years ago

Attachment #745320 - Flags: checked-in+

Massimo Gervasini [:massimo]

Assignee

Updated

•

12 years ago

Attachment #738509 - Flags: checked-in+

Chris AtLee [:catlee]

Comment 59

•

12 years ago

in production

Gregory Szorc [:gps]

Reporter

Comment 60

•

12 years ago

From a Windows 7 opt try I just pushed: command: START command: hg --config extensions.purge= purge -a --all e:\builds\moz2_slave\try-w32-0000000000000000000000\build command: cwd: e:\builds\moz2_slave\try-w32-0000000000000000000000 command: output: command: END (127.06s elapsed) command: START command: hg update -C -r f4cceab72020554b302791c075249dc125813c7f command: cwd: e:\builds\moz2_slave\try-w32-0000000000000000000000\build command: output: 1583 files updated, 0 files merged, 151 files removed, 0 files unresolved command: END (28.06s elapsed) 155s total to get the source tree in a sane state. Contrast with https://tbpl.mozilla.org/php/getParsedLog.php?id=22639341&tree=Try&full=1 which took 588s to rm -rf the source dir and 485s to |hg up| with an empty slate. That's 1073s before and 155s currently. If those numbers are average, Windows clobber builds (all try builds) will now complete 918s == 15:18 faster. \o/ \o/

Chris AtLee [:catlee]

Comment 61

•

12 years ago

Interesting that this works on win32 builds. It's failing on linux builds, and falling back to a fresh clobber.

Massimo Gervasini [:massimo]

Assignee

Comment 62

•

12 years ago

Attached patch fix for: "no repository found" error — Details — Splinter Review

now using dest as cwd for purge command.

Attachment #746418 - Flags: review?(catlee)

Chris AtLee [:catlee]

Updated

•

12 years ago

Attachment #746418 - Flags: review?(catlee) → review+

Massimo Gervasini [:massimo]

Assignee

Updated

•

12 years ago

Attachment #746418 - Flags: checked-in+

Justin Wood (:Callek)

Comment 63

•

12 years ago

massimo, this patch added 9 new pep8 violations: http://10.134.48.37:8080/job/tools_tests/194/

bhearsum@mozilla.com (:bhearsum)

Comment 64

•

12 years ago

bug 870323 has some fallout related to Python 2.7 specific syntax ("except foo as e" vs. "except foo, e")

Massimo Gervasini [:massimo]

Assignee

Comment 65

•

12 years ago

Attached patch fix for pep8 violations — Details — Splinter Review

Callek, this patch fixes all the pep8 validation errors and removes some duplicated tests.

Attachment #748002 - Flags: review?(bugspam.Callek)

Justin Wood (:Callek)

Comment 66

•

12 years ago

Comment on attachment 748002 [details] [diff] [review] fix for pep8 violations r+ on the pep8 fixups. Can you please either point out what each of these tests dupes are, and then reflag me for review. Or flag someone else more familiar with those tests/code for review?

Attachment #748002 - Flags: review?(bugspam.Callek)

Attachment #748002 - Flags: review?

Attachment #748002 - Flags: review+

Massimo Gervasini [:massimo]

Assignee

Comment 67

•

12 years ago

(In reply to Justin Wood (:Callek) from comment #66) > Comment on attachment 748002 [details] [diff] [review] > fix for pep8 violations > > r+ on the pep8 fixups. > > Can you please either point out what each of these tests dupes are, and then > reflag me for review. Or flag someone else more familiar with those > tests/code for review? Hi Callek, by mistake I have added the following tests twice: * testPurgeUntrackedDirectory * testPurgeTrackedFile * testPurgeUntrackedFile I have spotted them fixing the pep8 errors.

Massimo Gervasini [:massimo]

Assignee

Updated

•

12 years ago

Attachment #748002 - Flags: review? → review?(bugspam.Callek)

Justin Wood (:Callek)

Comment 68

•

12 years ago

Comment on attachment 748002 [details] [diff] [review] fix for pep8 violations Review of attachment 748002 [details] [diff] [review]: ----------------------------------------------------------------- Thanks for the explanation -- clearing my extra review request (since two r+'s by the same person just looks silly ;-) ) you're good to land this. Thanks again for the fixup

Attachment #748002 - Flags: review?(bugspam.Callek)

Mike Hommey [:glandium]

Updated

•

12 years ago

Depends on: 873067

Gregory Szorc [:gps]

Reporter

Comment 69

•

12 years ago

I just saw a l10n build log where 9:45 of the 10:03 it took to manage source code was spent in hg up, doing a full/clobber update, of course. It looked like the entire job was taking about 20 minutes. So, giving l10n builds the same treatment should net major savings for l10n build times!

Massimo Gervasini [:massimo]

Assignee

Updated

•

12 years ago

Attachment #748002 - Flags: checked-in+

Nobody; OK to take it and work on it

Updated

•

12 years ago

Product: mozilla.org → Release Engineering

John Hopkins (:jhopkins)

Comment 70

•

12 years ago

Anything left to do here?

Flags: needinfo?(mgervasini)

Gregory Szorc [:gps]

Reporter

Comment 71

•

12 years ago

Didn't this whole thing get backed out?

Massimo Gervasini [:massimo]

Assignee

Comment 72

•

12 years ago

(In reply to John Hopkins (:jhopkins) from comment #70) > Anything left to do here? Hi John, I don't know if it's worth to extend the same procedure to nightly builds, but it's something we could try if this thing has not been backed out as gps suggests

Flags: needinfo?(mgervasini)

Gregory Szorc [:gps]

Reporter

Comment 73

•

12 years ago

You can also fix this for Git. I noticed Git checkouts for B2G builds are also in the 10+ minute range. e.g. https://tbpl.mozilla.org/php/getParsedLog.php?id=26921499&tree=Mozilla-Central&full=1 shows the source checkout lasting from 04:02:54 to 04:14:25.

(no longer active)

Updated

•

11 years ago

Blocks: 950850

Chris AtLee [:catlee]

Comment 74

•

11 years ago

Yes, this got backed out due to some intermittent bustage. Looks like an hg bug, but we could never reproduce.

Chris AtLee [:catlee]

Comment 75

•

11 years ago

For git, most of the time is spent fetching new tags, which we don't generally need to do. I have a patch for 'repo' that needs to get upstreamed which will avoid a fetch if the manifest specifies an absolute commit id rather than a refname. Combined with work in bug 899969 which will give us completely specified manifests, we should get much faster b2g checkouts.

Chris AtLee [:catlee]

Updated

•

11 years ago

Depends on: 969689, 883918

Chris AtLee [:catlee]

Comment 76

•

11 years ago

Given the work in bug 969689, let's give this another shot. Re-enabled purge on try here: https://hg.mozilla.org/build/buildbotcustom/rev/6c60d0187e8a This will go live after the next reconfig.

bhearsum@mozilla.com (:bhearsum)

Comment 77

•

11 years ago

(In reply to Chris AtLee [:catlee] from comment #76) > Given the work in bug 969689, let's give this another shot. > > Re-enabled purge on try here: > https://hg.mozilla.org/build/buildbotcustom/rev/6c60d0187e8a > > This will go live after the next reconfig. in production, godspeed!

Pete Moore [:pmoore][:pete]

Comment 78

•

11 years ago

(In reply to Ben Hearsum [:bhearsum] from comment #77) > in production, godspeed! Should this bug be closed now? I'm aware we still have "clobber issues" but I'm not sure if they are the same ones that this bug aimed to solve.

Flags: needinfo?(catlee)

Pete Moore [:pmoore][:pete]

Comment 79

•

11 years ago

gps: ^^

Flags: needinfo?(catlee) → needinfo?(gps)

Gregory Szorc [:gps]

Reporter

Comment 80

•

11 years ago

If purge (instead of rm) is being used in production, then yes, this should be closed.

Flags: needinfo?(gps)

Massimo Gervasini [:massimo]

Assignee

Comment 81

•

10 years ago

let's close this

Status: NEW → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

7 years ago

Component: General Automation → General

Part 1: Add purge ability to hgtool, v1 12 years ago Gregory Szorc [:gps] 5.16 KB, patch	bhearsum : review-	Details \| Diff \| Splinter Review
Part 2: Purge, don't clobber, try builds, v1 12 years ago Gregory Szorc [:gps] 2.74 KB, patch		Details \| Diff \| Splinter Review
look for inefficient clobbers 12 years ago bhearsum@mozilla.com (:bhearsum) 694 bytes, patch		Details \| Diff \| Splinter Review
based on gps patch: renamed do_purge, added tests 12 years ago Massimo Gervasini [:massimo] 6.50 KB, patch	catlee : feedback-	Details \| Diff \| Splinter Review
enabling hg --purge on try branch 12 years ago Massimo Gervasini [:massimo] 1.55 KB, patch	catlee : feedback+ massimo : checked-in+	Details \| Diff \| Splinter Review
better handling for windows long paths, added tests 12 years ago Massimo Gervasini [:massimo] 10.40 KB, patch		Details \| Diff \| Splinter Review
updated _rmtree_windows(), added cwd in purge command 12 years ago Massimo Gervasini [:massimo] 11.96 KB, patch	catlee : feedback+	Details \| Diff \| Splinter Review
added fix for undefined PYWIN32, removed license, fixed typo, minor fixes 12 years ago Massimo Gervasini [:massimo] 11.90 KB, patch	catlee : review+ massimo : checked-in+	Details \| Diff \| Splinter Review
fix for: "no repository found" error 12 years ago Massimo Gervasini [:massimo] 526 bytes, patch	catlee : review+ massimo : checked-in+	Details \| Diff \| Splinter Review
fix for pep8 violations 12 years ago Massimo Gervasini [:massimo] 4.42 KB, patch	Callek : review+ massimo : checked-in+	Details \| Diff \| Splinter Review