Closed
Bug 732025
Opened 13 years ago
Closed 12 years ago
post_upload.py should use hardlinks where possible
Categories
(Release Engineering :: General, enhancement, P3)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: catlee, Assigned: catlee)
References
Details
(Whiteboard: [ftp][postupload])
Attachments
(2 files, 1 obsolete file)
2.89 KB,
patch
|
nthomas
:
review+
catlee
:
checked-in+
|
Details | Diff | Splinter Review |
1.08 KB,
patch
|
bhearsum
:
review+
nthomas
:
review+
catlee
:
checked-in+
|
Details | Diff | Splinter Review |
In many cases, post_upload.py is copying the same file to multiple locations on disk. It would be more efficient to use hard links if we're copying to the same partition.
Updated•13 years ago
|
Component: Release Engineering → Release Engineering: Automation
QA Contact: release → catlee
Assignee | ||
Updated•13 years ago
|
Severity: normal → enhancement
Priority: -- → P3
Assignee | ||
Comment 1•13 years ago
|
||
this maintains a cache per run of post_upload.py of which files have been copied in, and tries to use hardlinks when the same file is being copied multiple times.
os.link raises an OSError in the case where new_file already exists, or if new_file and _linkCache[original_file] are on different partitions. In either case we fall back to the old style behaviour.
I've switched the old style behaviour to using temporary files in the destination directory instead of temporary sub directories, because I think this will result in less churn on the directory itself.
Attachment #603451 -
Flags: feedback?
Assignee | ||
Updated•13 years ago
|
Attachment #603451 -
Flags: feedback? → feedback?(nrthomas)
Comment 2•13 years ago
|
||
Comment on attachment 603451 [details] [diff] [review]
try to hardlink first
>@@ -60,22 +63,32 @@ def CopyFileToDir(original_file, source_dir, dest_dir, preserve_dirs=False):
>- return
>- tmpdir = tempfile.mkdtemp(prefix=".~", dir=dest_dir)
>- tmp_path = os.path.join(tmpdir, os.path.basename(new_file))
>- shutil.copyfile(original_file, tmp_path)
[snip]
>+ tmp_fd, tmp_path = tempfile.mkstemp(dir=dest_dir)
>+ tmp_fp = os.fdopen(tmp_fd, 'wb')
>+ shutil.copyfileobj(open(original_file, 'rb'), tmp_fp)
>+ tmp_fp.close()
> os.rename(tmp_path, new_file)
>- os.rmdir(tmpdir)
So this change would avoid the temporary directory and just have a temp file. Are there any other benefits ? What is fp short for ?
>+ _linkCache[original_file] = new_file
original_file is an absolute path, so if we call CopyFileToDir several times on how do we get any hits on the cache ? Probably would work if you basename first.
Comment 3•13 years ago
|
||
Comment on attachment 603451 [details] [diff] [review]
try to hardlink first
feedback+ on the idea, but see the questions on the implementation in the previous comment.
Looks like this will mostly help with the duplication of nightlies in the dated, latest, and tinderbox-builds. I did a quick test and moving them out of tinderbox-builds/latest and the dated file is fine, as you'd expect.
Attachment #603451 -
Flags: feedback?(nrthomas) → feedback+
Assignee | ||
Comment 4•13 years ago
|
||
(In reply to Nick Thomas [:nthomas] from comment #2)
> Comment on attachment 603451 [details] [diff] [review]
> try to hardlink first
>
> >@@ -60,22 +63,32 @@ def CopyFileToDir(original_file, source_dir, dest_dir, preserve_dirs=False):
> >- return
> >- tmpdir = tempfile.mkdtemp(prefix=".~", dir=dest_dir)
> >- tmp_path = os.path.join(tmpdir, os.path.basename(new_file))
> >- shutil.copyfile(original_file, tmp_path)
> [snip]
> >+ tmp_fd, tmp_path = tempfile.mkstemp(dir=dest_dir)
> >+ tmp_fp = os.fdopen(tmp_fd, 'wb')
> >+ shutil.copyfileobj(open(original_file, 'rb'), tmp_fp)
> >+ tmp_fp.close()
> > os.rename(tmp_path, new_file)
> >- os.rmdir(tmpdir)
>
> So this change would avoid the temporary directory and just have a temp
> file. Are there any other benefits ? What is fp short for ?
I believe that not using a temporary directory reduces the number of extra directory entries in the parent. When using temp dirs you have at most one temp dir and one file per file actively being copied in. When using temp files, I think you only have the one temp file. I hope this reduces the churn on the directory file itself.
> >+ _linkCache[original_file] = new_file
>
> original_file is an absolute path, so if we call CopyFileToDir several times
> on how do we get any hits on the cache ? Probably would work if you basename
> first.
original_file would be a path in /tmp, wherever 'make upload' put them. it will be the same for multiple calls to CopyFileToDir in a single invocation of post_upload.py
Assignee | ||
Updated•13 years ago
|
Assignee: nobody → catlee
Comment 5•13 years ago
|
||
Ok, sounds good.
Assignee | ||
Comment 6•13 years ago
|
||
Comment on attachment 603451 [details] [diff] [review]
try to hardlink first
staging run and local testing look good.
Attachment #603451 -
Flags: review?(nrthomas)
Updated•13 years ago
|
Attachment #603451 -
Flags: review?(nrthomas) → review+
Assignee | ||
Comment 7•13 years ago
|
||
similar to before except keep a history of previous locations we've copied to instead of just the most recent one. this protects against worst case scenarios where a file is copied to partitions in order A,B,A.
Attachment #603451 -
Attachment is obsolete: true
Attachment #628502 -
Flags: review?(nrthomas)
Comment 8•13 years ago
|
||
Comment on attachment 628502 [details] [diff] [review]
hardlink harder
Looks fine too.
Attachment #628502 -
Flags: review?(nrthomas) → review+
Assignee | ||
Comment 9•13 years ago
|
||
Comment on attachment 628502 [details] [diff] [review]
hardlink harder
checked into tools repo. will do svn deployment tomorrow
Attachment #628502 -
Flags: checked-in+
Assignee | ||
Comment 10•13 years ago
|
||
bounced in deployment because permissions don't get set correctly
Assignee | ||
Comment 11•13 years ago
|
||
right now we rely on umask to have files that are readable. I'm not sure why we haven't tried to set permissions explicitly in post_upload.py before, but this patch sets new files to 0644. the previous version failed because it uses mkstemp() to create the new files, which have permissions 0600.
Attachment #628774 -
Flags: review?(nrthomas)
Attachment #628774 -
Flags: review?(bhearsum)
Comment 12•13 years ago
|
||
Comment on attachment 628774 [details] [diff] [review]
set permissions explicitly
Review of attachment 628774 [details] [diff] [review]:
-----------------------------------------------------------------
I briefly looked at some existing files and couldn't find anything that wasn't 644, so I think this is safe to do...
Attachment #628774 -
Flags: review?(bhearsum) → review+
Comment 13•12 years ago
|
||
Comment on attachment 628774 [details] [diff] [review]
set permissions explicitly
We could go a step further and ensure the group ownership is set to the product, to ensure the primary group is set properly, but with files getting 644 and dirs getting 755 that's not really necessary.
Attachment #628774 -
Flags: review?(nrthomas) → review+
Assignee | ||
Comment 14•12 years ago
|
||
Comment on attachment 628774 [details] [diff] [review]
set permissions explicitly
will wait for current releases to finish before deploying with svn
Attachment #628774 -
Flags: checked-in+
Assignee | ||
Comment 15•12 years ago
|
||
deployed via svn/puppet
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
Comment 16•11 years ago
|
||
FYI, recent changes in how the directories are broken up into several partitions negate the hardlinking added here. eg firefox/tinderbox-builds and firefox/nightly are now different partitions, where previously they were on a single partition.
Updated•7 years ago
|
Component: General Automation → General
You need to log in
before you can comment on or make changes to this bug.
Description
•