If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

post_upload.py should use hardlinks where possible

RESOLVED FIXED

Status

Release Engineering
General Automation
P3
enhancement
RESOLVED FIXED
6 years ago
4 years ago

People

(Reporter: catlee, Assigned: catlee)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [ftp][postupload])

Attachments

(2 attachments, 1 obsolete attachment)

(Assignee)

Description

6 years ago
In many cases, post_upload.py is copying the same file to multiple locations on disk. It would be more efficient to use hard links if we're copying to the same partition.

Updated

6 years ago
Component: Release Engineering → Release Engineering: Automation
QA Contact: release → catlee
(Assignee)

Updated

6 years ago
Severity: normal → enhancement
Priority: -- → P3
(Assignee)

Comment 1

6 years ago
Created attachment 603451 [details] [diff] [review]
try to hardlink first

this maintains a cache per run of post_upload.py of which files have been copied in, and tries to use hardlinks when the same file is being copied multiple times.

os.link raises an OSError in the case where new_file already exists, or if new_file and _linkCache[original_file] are on different partitions. In either case we fall back to the old style behaviour.

I've switched the old style behaviour to using temporary files in the destination directory instead of temporary sub directories, because I think this will result in less churn on the directory itself.
Attachment #603451 - Flags: feedback?
(Assignee)

Updated

6 years ago
Attachment #603451 - Flags: feedback? → feedback?(nrthomas)
Comment on attachment 603451 [details] [diff] [review]
try to hardlink first

>@@ -60,22 +63,32 @@ def CopyFileToDir(original_file, source_dir, dest_dir, preserve_dirs=False):
>-            return 
>-    tmpdir = tempfile.mkdtemp(prefix=".~", dir=dest_dir)
>-    tmp_path = os.path.join(tmpdir, os.path.basename(new_file))
>-    shutil.copyfile(original_file, tmp_path)
[snip]
>+    tmp_fd, tmp_path = tempfile.mkstemp(dir=dest_dir)
>+    tmp_fp = os.fdopen(tmp_fd, 'wb')
>+    shutil.copyfileobj(open(original_file, 'rb'), tmp_fp)
>+    tmp_fp.close()
>     os.rename(tmp_path, new_file)
>-    os.rmdir(tmpdir)

So this change would avoid the temporary directory and just have a temp file. Are there any other benefits ? What is fp short for ?

>+    _linkCache[original_file] = new_file

original_file is an absolute path, so if we call CopyFileToDir several times on how do we get any hits on the cache ? Probably would work if you basename first.
Comment on attachment 603451 [details] [diff] [review]
try to hardlink first

feedback+ on the idea, but see the questions on the implementation in the previous comment. 

Looks like this will mostly help with the duplication of nightlies in the dated, latest, and tinderbox-builds. I did a quick test and moving them out of tinderbox-builds/latest and the dated file is fine, as you'd expect.
Attachment #603451 - Flags: feedback?(nrthomas) → feedback+
(Assignee)

Comment 4

6 years ago
(In reply to Nick Thomas [:nthomas] from comment #2)
> Comment on attachment 603451 [details] [diff] [review]
> try to hardlink first
> 
> >@@ -60,22 +63,32 @@ def CopyFileToDir(original_file, source_dir, dest_dir, preserve_dirs=False):
> >-            return 
> >-    tmpdir = tempfile.mkdtemp(prefix=".~", dir=dest_dir)
> >-    tmp_path = os.path.join(tmpdir, os.path.basename(new_file))
> >-    shutil.copyfile(original_file, tmp_path)
> [snip]
> >+    tmp_fd, tmp_path = tempfile.mkstemp(dir=dest_dir)
> >+    tmp_fp = os.fdopen(tmp_fd, 'wb')
> >+    shutil.copyfileobj(open(original_file, 'rb'), tmp_fp)
> >+    tmp_fp.close()
> >     os.rename(tmp_path, new_file)
> >-    os.rmdir(tmpdir)
> 
> So this change would avoid the temporary directory and just have a temp
> file. Are there any other benefits ? What is fp short for ?

I believe that not using a temporary directory reduces the number of extra directory entries in the parent. When using temp dirs you have at most one temp dir and one file per file actively being copied in. When using temp files, I think you only have the one temp file. I hope this reduces the churn on the directory file itself.

> >+    _linkCache[original_file] = new_file
> 
> original_file is an absolute path, so if we call CopyFileToDir several times
> on how do we get any hits on the cache ? Probably would work if you basename
> first.

original_file would be a path in /tmp, wherever 'make upload' put them. it will be the same for multiple calls to CopyFileToDir in a single invocation of post_upload.py
(Assignee)

Updated

6 years ago
Assignee: nobody → catlee
Ok, sounds good.
(Assignee)

Comment 6

5 years ago
Comment on attachment 603451 [details] [diff] [review]
try to hardlink first

staging run and local testing look good.
Attachment #603451 - Flags: review?(nrthomas)

Updated

5 years ago
Attachment #603451 - Flags: review?(nrthomas) → review+
(Assignee)

Comment 7

5 years ago
Created attachment 628502 [details] [diff] [review]
hardlink harder

similar to before except keep a history of previous locations we've copied to instead of just the most recent one. this protects against worst case scenarios where a file is copied to partitions in order A,B,A.
Attachment #603451 - Attachment is obsolete: true
Attachment #628502 - Flags: review?(nrthomas)
Comment on attachment 628502 [details] [diff] [review]
hardlink harder

Looks fine too.
Attachment #628502 - Flags: review?(nrthomas) → review+
(Assignee)

Comment 9

5 years ago
Comment on attachment 628502 [details] [diff] [review]
hardlink harder

checked into tools repo. will do svn deployment tomorrow
Attachment #628502 - Flags: checked-in+
(Assignee)

Updated

5 years ago
Blocks: 760059
(Assignee)

Comment 10

5 years ago
bounced in deployment because permissions don't get set correctly
(Assignee)

Comment 11

5 years ago
Created attachment 628774 [details] [diff] [review]
set permissions explicitly

right now we rely on umask to have files that are readable. I'm not sure why we haven't tried to set permissions explicitly in post_upload.py before, but this patch sets new files to 0644. the previous version failed because it uses mkstemp() to create the new files, which have permissions 0600.
Attachment #628774 - Flags: review?(nrthomas)
Attachment #628774 - Flags: review?(bhearsum)
Comment on attachment 628774 [details] [diff] [review]
set permissions explicitly

Review of attachment 628774 [details] [diff] [review]:
-----------------------------------------------------------------

I briefly looked at some existing files and couldn't find anything that wasn't 644, so I think this is safe to do...
Attachment #628774 - Flags: review?(bhearsum) → review+
Comment on attachment 628774 [details] [diff] [review]
set permissions explicitly

We could go a step further and ensure the group ownership is set to the product, to ensure the primary group is set properly, but with files getting 644 and dirs getting 755 that's not really necessary.
Attachment #628774 - Flags: review?(nrthomas) → review+
(Assignee)

Comment 14

5 years ago
Comment on attachment 628774 [details] [diff] [review]
set permissions explicitly

will wait for current releases to finish before deploying with svn
Attachment #628774 - Flags: checked-in+
(Assignee)

Comment 15

5 years ago
deployed via svn/puppet
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
FYI, recent changes in how the directories are broken up into several partitions negate the hardlinking added here. eg firefox/tinderbox-builds and firefox/nightly are now different partitions, where previously they were on a single partition.
You need to log in before you can comment on or make changes to this bug.