Closed
Bug 803530
Opened 12 years ago
Closed 12 years ago
dxr-mozilla-central builds failing after Oct 10-12th with "command timed out: 3600 seconds without output"
Categories
(Release Engineering :: General, defect, P2)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: emorley, Unassigned)
References
Details
(Whiteboard: [dxr])
DXR builds are currently failing with:
{
...
...
webapprt.cpp
/builds/slave/dxr-mozilla-central/dxr-build-env/src/webapprt/components.manifest: WARNING: no useful preprocessor directives found
/builds/slave/dxr-mozilla-central/dxr-build-env/src/webapprt/ContentPolicy.js: WARNING: no preprocessor directives found
/builds/slave/dxr-mozilla-central/dxr-build-env/src/webapprt/ContentPermission.js: WARNING: no preprocessor directives found
/builds/slave/dxr-mozilla-central/dxr-build-env/src/webapprt/CommandLineHandler.js: WARNING: no preprocessor directives found
/builds/slave/dxr-mozilla-central/dxr-build-env/src/webapprt/DirectoryProvider.js: WARNING: no preprocessor directives found
AboutRedirector.cpp
DirectoryProvider.cpp
nsPrivateBrowsingServiceWrapper.cpp
nsFeedSniffer.cpp
/builds/slave/dxr-mozilla-central/dxr-build-env/src/config/makefiles/mochitest.mk:47: browser_forgetthissite_single.js temporarily disabled because of very frequent oranges, see bug 551540
/builds/slave/dxr-mozilla-central/dxr-build-env/src/config/makefiles/mochitest.mk:47: browser_sidebarpanels_click.js temporarily disabled cause it breaks the treeview, see bug 658744
/builds/slave/dxr-mozilla-central/dxr-build-env/src/config/rules.mk:1649: browser_forgetthissite_single.js temporarily disabled because of very frequent oranges, see bug 551540
/builds/slave/dxr-mozilla-central/dxr-build-env/src/config/rules.mk:1649: browser_sidebarpanels_click.js temporarily disabled cause it breaks the treeview, see bug 658744
/builds/slave/dxr-mozilla-central/dxr-build-env/src/config/rules.mk:1655: browser_forgetthissite_single.js temporarily disabled because of very frequent oranges, see bug 551540
/build
command timed out: 3600 seconds without output, attempting to kill
process killed by signal 9
program finished with exit code -1
elapsedTime=9174.435886
========= Finished 'scripts/scripts/dxr/dxr.sh' failed (results: 2, elapsed: 2 hrs, 32 mins, 54 secs) (at 2012-10-18 05:40:05.617573) =========
}
eg:
https://tbpl.mozilla.org/php/getParsedLog.php?id=16229815&tree=Firefox
https://tbpl.mozilla.org/php/getParsedLog.php?id=16194482&tree=Firefox
https://tbpl.mozilla.org/php/getParsedLog.php?id=16155389&tree=Firefox
First known bad:
https://tbpl.mozilla.org/php/getParsedLog.php?id=16041442&tree=Firefox
Last known good:
https://tbpl.mozilla.org/php/getParsedLog.php?id=16009709&tree=Firefox
Comment 1•12 years ago
|
||
How long should this be taking without producing any output?
Severity: critical → major
Whiteboard: [dxr]
Comment 2•12 years ago
|
||
Hmm, that "last known good" shouldn't be good at all (FF compile failed), which means probably that the build script doesn't have an exit-on-error enabled somewhere. The logs appear to indicate that the build succeeds and the post-processing is what times out.
Jonas, what kind of intermediate output does your build.sh script dump out?
Comment 3•12 years ago
|
||
build.sh essentially sets some environment variables as calls dxr/update.sh with config file and tree. So it dumps all the useless compiler warnings and warnings from DXR clang plugin.
But I think catlee added a grep at the invocation of build.sh, to eat all the "Unprocessekind..." warnings from the dxr-clang plugin, so they wouldn't fill up the log and cause problems.
With regards to exit-on-error, this was a minimum effort to get DXR deployed. So I wouldn't be surprised if the build could fail silently, although we do check size of generated tarball afterwards.
Comment 4•12 years ago
|
||
The dxr compile for today seems to have passed, but not the one for the past few days. The time in the successful compile appears to be about 15 minutes shorter than the unsuccessful one, so it seems very likely that the time for the post-processing is running around the 1 hour mark that is the cutoff for killing processes with no output.
I worked with catlee a bit today for debugging this, but, most likely, the fact that DXR produces sparing output combined with the I/O being unflushed (!) means that DXR never appears to have any progress. sys.stdout.flush() after most of the print statements should break the time for running into small enough chunks that buildbot won't think it's stuck in an infinite loop...
Updated•12 years ago
|
Priority: -- → P2
Comment 5•12 years ago
|
||
The build for 2012-10-24 also failed, so it's definitely not fixed.
Comment 6•12 years ago
|
||
I pushed a change which flushes, as mentioned in comment 4. Let's see if this gets things to work.
Comment 7•12 years ago
|
||
First run ran out of disk space, set retry, the retry caught the same slave which the previous run had broken. Better luck tomorrow.
Comment 9•12 years ago
|
||
(In reply to Joshua Cranmer [:jcranmer] from comment #6)
> I pushed a change which flushes, as mentioned in comment 4. Let's see if
> this gets things to work.
This appears to not have worked. Is it possible to try bumping the timeout to 2 hours without output?
Comment 10•12 years ago
|
||
I bumped the timeout to 2 hours. This will take effect after the next reconfig.
Comment 11•12 years ago
|
||
In production
Comment 12•12 years ago
|
||
Looks like this worked, I see green builds.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
Assignee | ||
Updated•7 years ago
|
Component: General Automation → General
You need to log in
before you can comment on or make changes to this bug.
Description
•