Closed Bug 884028 Opened 6 years ago Closed 6 years ago

Mach sometimes hits an IOError when building rapidly

Categories

(Firefox Build System :: General, defect)

x86_64
Linux
defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED
mozilla24

People

(Reporter: johns, Assigned: gps)

References

Details

(Whiteboard: [mach])

Attachments

(1 file)

While compiling with -j64 in a directory that would have a 100% ccache hitrate, causing really fast file spew:

>  1:35.54 nsMacGreekToUnicode.cpp
>  1:35.54 nsMacGujaratiToUnicode.cpp
>  1:35.55 nsMacGurmukhiToUnicode.cpp
>  1:35.55 nsMacHebrewToUnicode.cpp
>  1:35.55 nsMacIcelandicToUnicode.cpp
> TIER: base nspr nss js platform app SUBTIER: export libs tools DIRECTORIES: 03/81 (intl)Exception in thread Thread-1:
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/threading.py", line 808, in __bootstrap_inner
>     self.run()
>   File "/usr/lib/python2.7/threading.py", line 761, in run
>     self.__target(*self.__args, **self.__kwargs)
>   File "/home/johns/moz/moz-git/testing/mozbase/mozprocess/mozprocess/processhandler.py", line 698, in _processOutput
>     self.processOutputLine(line.rstrip())
>   File "/home/johns/moz/moz-git/testing/mozbase/mozprocess/mozprocess/processhandler.py", line 663, in processOutputLine
>     handler(line)
>   File "/home/johns/moz/moz-git/python/mach/mach/mixin/process.py", line 86, in handleLine
>     line_handler(line)
>   File "/home/johns/moz/moz-git/python/mozbuild/mozbuild/mach_commands.py", line 244, in on_line
>     self.log(logging.INFO, 'build_output', {'line': line}, '{line}')
>   File "/home/johns/moz/moz-git/python/mach/mach/mixin/logging.py", line 54, in log
>     extra={'action': action, 'params': params})
>   File "/usr/lib/python2.7/logging/__init__.py", line 1213, in log
>     self._log(level, msg, args, **kwargs)
>   File "/usr/lib/python2.7/logging/__init__.py", line 1268, in _log
>     self.handle(record)
>   File "/usr/lib/python2.7/logging/__init__.py", line 1278, in handle
>     self.callHandlers(record)
>   File "/usr/lib/python2.7/logging/__init__.py", line 1318, in callHandlers
>     hdlr.handle(record)
>   File "/usr/lib/python2.7/logging/__init__.py", line 749, in handle
>     self.emit(record)
>   File "/home/johns/moz/moz-git/python/mozbuild/mozbuild/mach_commands.py", line 73, in emit
>     self.fh.write('\n')
> IOError: [Errno 5] Input/output error

mach then sat hung until interrupted. Resuming the build worked as expected.
Achievement unlocked.

I have no clue what's going on here. It smells like a multi-threading issue, but the GIL should protect us. Hmm...
This does look vaguely though frighteningly familiar.  What does `python --version` say?
(In reply to Jeff Hammel [:jhammel] from comment #2)
> This does look vaguely though frighteningly familiar.  What does `python
> --version` say?

Python 2.7.5
Hmmm.....well, that should be about as good as possible :(  Searching near and far has so far not yielded me results.
I'm guessing this is a multi-threading issue that occurs when trying to
.write() to a fd when a write is in progress. Python's logging
infrastructure has the ability to lock, so let's use it.
Attachment #764241 - Flags: review?(ted)
Assignee: nobody → gps
Attachment #764241 - Flags: review?(ted) → review+
Also, -j64 is insane. I have data showing that -j greater than number of cores doesn't yield a speedup. If -j is *much* greater than number of cores, you're just creating excessive context switching and resource contention and likely making the build slower as a result. I welcome data proving me wrong. You can use my patches in bug 883209 to record and visualize resource usage when building.
Moving component because this was an issue with the |mach build| command and not the mach core.
Status: NEW → ASSIGNED
Component: mach → Build Config
Whiteboard: [mach]
(In reply to Gregory Szorc [:gps] from comment #7)
> Also, -j64 is insane. I have data showing that -j greater than number of
> cores doesn't yield a speedup. If -j is *much* greater than number of cores,
> you're just creating excessive context switching and resource contention and
> likely making the build slower as a result. I welcome data proving me wrong.
> You can use my patches in bug 883209 to record and visualize resource usage
> when building.

I have -j64 because I'm using distcc with 32 cores. It actually does hurt a bit when doing non-distributable things like idl processing (it'd be neat to have separate job limits for compiler vs other jobs) but is needed to saturate the distcc network
https://hg.mozilla.org/mozilla-central/rev/aa1227e3d6ba
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Blocks: 886498
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.