Open Bug 1177336 Opened 4 years ago Updated 4 years ago

mozfile.remove uses excessive system calls

Categories

(Testing :: Mozbase, defect)

defect
Not set

Tracking

(Not tracked)

People

(Reporter: gps, Unassigned, Mentored)

Details

Attachments

(1 file)

Patch forthcoming.
Bug 1177336 - Reduce system calls in mozfile.remove; r?ted

The os.path.* functions tend to perform os.stat() function calls behind
the scenes. Python doesn't cache stat() results, so many of these
function calls resulted in Python performing system calls. These system
calls aren't exactly free. And, when we perform thousands or even
hundreds of thousands of them (as we do when performing clobbers), they
add up, especially on Windows.

This commit refactors the code in mozfile.remove() to remove a
significant number of stat() system calls. We do this by storing the
initial stat() result and using the various stat.S_* functions to query
its state rather than use the os.path.* functions.

This code still isn't optimal. The os.walk() in particular is known to
be very inefficient because its implementation will perform an os.stat()
on each directory entry to see whether it is a file or directory. And,
this stat() result is not passed to the caller, requiring it to perform
another os.stat() to query additional metadata. The os.scandir()
function (available in Python 3.5 and as a PyPI package) solves this
properly. But we stop short of importing it into the tree and
introducing a new dependency.

On my MacBook Pro, this reduces the clobber of a Firefox objdir from
~6.8s to ~5.3s. Not a bad improvement! I suspect the improvement on
Windows will be even more pronounced, as os.stat() there tends to be
quite slow.
Attachment #8626054 - Flags: review?(ted)
For Windows we should just fix bug 1162519.
Attachment #8626054 - Flags: review?(ted) → review+
Comment on attachment 8626054 [details]
MozReview Request: Bug 1177336 - Reduce system calls in mozfile.remove; r?ted

https://reviewboard.mozilla.org/r/11987/#review10839

Ship It!

::: testing/mozbase/mozfile/mozfile/mozfile.py:209
(Diff revision 1)
> -    if os.path.isfile(path) or os.path.islink(path):
> +    is_link = os.path.islink(path)

Would you rather just call `os.lstat` instead of `os.stat` above? (I'd guess that's what islink calls under the hood anyway.)
https://reviewboard.mozilla.org/r/11987/#review10839

> Would you rather just call `os.lstat` instead of `os.stat` above? (I'd guess that's what islink calls under the hood anyway.)

I had that coded at one point. However, this part of the code only runs on the main directory being deleted. I didn't feel the extra complexity was worth saving a single system call.

FWIW, I have more ideas for this code. Maybe I'll find time to refactor it a bit again.
url:        https://hg.mozilla.org/integration/fx-team/rev/5154443dcbeddda7580673e076d7b5e6d8b76542
changeset:  5154443dcbeddda7580673e076d7b5e6d8b76542
user:       Gregory Szorc <gps@mozilla.com>
date:       Wed Jul 01 09:58:58 2015 -0700
description:
Bug 1177336 - Reduce system calls in mozfile.remove; r=ted

The os.path.* functions tend to perform os.stat() function calls behind
the scenes. Python doesn't cache stat() results, so many of these
function calls resulted in Python performing system calls. These system
calls aren't exactly free. And, when we perform thousands or even
hundreds of thousands of them (as we do when performing clobbers), they
add up, especially on Windows.

This commit refactors the code in mozfile.remove() to remove a
significant number of stat() system calls. We do this by storing the
initial stat() result and using the various stat.S_* functions to query
its state rather than use the os.path.* functions.

This code still isn't optimal. The os.walk() in particular is known to
be very inefficient because its implementation will perform an os.stat()
on each directory entry to see whether it is a file or directory. And,
this stat() result is not passed to the caller, requiring it to perform
another os.stat() to query additional metadata. The os.scandir()
function (available in Python 3.5 and as a PyPI package) solves this
properly. But we stop short of importing it into the tree and
introducing a new dependency.

On my MacBook Pro, this reduces the clobber of a Firefox objdir from
~6.8s to ~5.3s. Not a bad improvement! I suspect the improvement on
Windows will be even more pronounced, as os.stat() there tends to be
quite slow.
Assignee: nobody → gps
gps, is this something a contributor could pick up and finish?
Flags: needinfo?(gps)
Yes, this is contributor worthy.
Assignee: gps → administration
Mentor: gps
Flags: needinfo?(gps)
You need to log in before you can comment on or make changes to this bug.