Closed Bug 1368057 Opened 2 years ago Closed 2 years ago

Mach makes link failures annoyingly slow with xterm on Mac

Categories

(Firefox Build System :: General, defect)

53 Branch
defect
Not set

Tracking

(firefox55 fixed)

RESOLVED FIXED
mozilla55
Tracking Status
firefox55 --- fixed

People

(Reporter: bzbarsky, Assigned: gps)

Details

Attachments

(1 file)

This has been bothering me for a while; I finally took the time to make some measurements.

STEPS TO REPRODUCE:

1)  Create a clean Firefox build.
2)  Open dom/base/Element.cpp in an editor.
3)  Delete of #if 0 out the function with this declaration:

      bool
      Element::GetAttr(int32_t aNameSpaceID, nsIAtom* aName,
                       nsAString& aResult) const
4)  time mach build dom/base

ACTUAL RESULTS: The link failure causes us to show the contents of $objdir/toolkit/library/${RANDOM}.list, which is a long list of .o files.  It takes 35+ seconds to show this list over here.  During this time, we're not even CPU-bound; I'm seeing 5s user time, 1s system time, 18% CPU utilization.

EXPECTED RESULTS: It shouldn't take nearly this long to show the list.  The list has about 1900 entries.  We're taking about 35 seconds, of which about 5 seconds is real work (more on this below).  So we're showing 63 list entries per second, or one entry per 16ms.  There's no way we should need 16ms of processing per entry.  This is suspicious; see below.

Some additional information on the performance:

1)  If I pass --log-no-times to mach, that does not change timing.
2)  If I pipe the entire operation through cat, while still logging times,
    the entire thing takes about 4s (according to mach's own timing).
3)  If I change _colorize in python/mach/mach/logging.py to just return s
    immediately, times are unchanged.
4)  If I change the format() function in python/mach/mach/logging.py to just
    set "t" to format_seconds(self._time(record)) instead of doing the
    terminal.blue() thing, then times are unchanged (though the coloring
    goes away, as expected).
5)  If I use Terminal.app instead of xterm, time is about 4s with the coloring
    either on or off.
6)  If I completely cover up my xterm window right after I start the mach build,
    things take 4s (!).

So at a guess, in the normal mode we're doing some sort of terminal flush at every line that causes the terminal to wait for vsync or something.  That would be consistent with the 16ms-per-line thing, as well as the "CPU usage is actually low" thing.

I'm using XQuartz 2.7.11 for what it's worth.  I do think this might have gotten worse the last time I updated XQuartz a few months ago.
Fun fact: one of the reasons I pushed to make `make -s` the default ~6 years ago was because of performance issues with terminals, not unlike what you've described here. Writing to terminals can just be slow - even if it is plain text without any cosmetics such as coloring, footers, etc.

To confirm your theory about a flush, could you comment out lines in python/mozbuild/mozbuild/mach_commands.py in the emit() function around line 110 related to the footer and the flush() call required to display it properly? The final function should look something like:

    def emit(self, record):
        msg = self.format(record)
        self.acquire()

        try:
            self.fh.write(msg)
            self.fh.write('\n')
        finally:
            self.release()
Flags: needinfo?(bzbarsky)
If I make that change, time is about 6s.  So yes, that seems to fix the problem, pretty much.

Also, it's not the flush() call that's being slow.  Nor the draw() call.  It's the clear() call.

And in there, it's the clear_eos() call that's slow.

Note that in this case self.tiers is falsy, so the following draw() call is a no-op anyway.  In fact, as far as I can tell self.tiers is always falsy for me when draw() is called when doing a non-toplevel build like this.
Flags: needinfo?(bzbarsky)
https://hg.mozilla.org/mozilla-central/log/0e9853e31da9/python/mozbuild/mozbuild/mach_commands.py?patch=&linerange=130:199 contains the full history of this code, which is a useful history lesson.

I was immediately thinking "that clear_eos()" shouldn't be necessary: we should just clear to end of line. But then I saw bug 875684.

Anyway, I've tested replacing clear_eos() with clear_eol() with multiple terminals and it appears to "just work" without triggering the badness from bug 875684. (I think another refactor between that bug and today related to cursor positioning enabled this.)

So I'm going to submit a patch to switch to clear_eol(). I suspect this will make the perf issue go away because I suspect the "clear until end of screen" is being handled sub-optimally compared to "clear until end of line."
Assignee: nobody → gps
Status: NEW → ASSIGNED
Yes, clear_eol() is much faster over here.  Thank you!
Comment on attachment 8873567 [details]
Bug 1368057 - Clear terminal footer to end of line, not screen;

https://reviewboard.mozilla.org/r/144952/#review149126

The worse that can happen here is that the footer spans multiple lines, and this will not work terribly right in that case, but it already doesn't work well in that case anyways.
Attachment #8873567 - Flags: review?(mh+mozilla) → review+
Pushed by gszorc@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/8fe7c30a2202
Clear terminal footer to end of line, not screen; r=glandium
Thank you for helping us track this down, bz!

For the record, it pains me to think of how many seconds of people time were wasted throughout the years because of this. 16ms per line of build output multiplied by millions of lines (divided by fraction of affected terminals) could be quite a large number :/

But the important thing is it is fixed now.
Yep.  I think for the non-link-error case it's less obvious that it was a problem at all since most things we do with the build take more than 16ms per line anyway.
https://hg.mozilla.org/mozilla-central/rev/8fe7c30a2202
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.