Closed Bug 774844 Opened 12 years ago Closed 11 years ago

Intermittent OS X 10.7 leakstats | log file incomplete (after "obj-firefox/dist/bin/leakstats: no callsite for 'F' (70)! | Unknown event type 0xffffff90")

Categories

(Release Engineering :: General, defect)

x86_64
macOS
defect
Not set
major

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: emorley, Unassigned)

References

()

Details

(Keywords: intermittent-failure, Whiteboard: [red])

Not sure if this should be in Core::* or release engineering. OS X 10.7 64-bit mozilla-inbound leak test build on 2012-07-17 08:28:49 PDT for push 0888a88ab4c7 slave: bld-lion-r5-080 https://tbpl.mozilla.org/php/getParsedLog.php?id=13605090&tree=Mozilla-Inbound { ========= Started compare current leak logs failed (results: 2, elapsed: 0 secs) (at 2012-07-17 09:02:59.237050) ========= obj-firefox/dist/bin/leakstats ../malloc.log in dir /builds/slave/m-in-osx64-dbg/build (timeout 1200 secs) watching logfiles {} argv: ['obj-firefox/dist/bin/leakstats', '../malloc.log'] environment: Apple_PubSub_Socket_Render=/tmp/launch-6NvL74/Render CCACHE_BASEDIR=/builds/slave/m-in-osx64-dbg CCACHE_COMPRESS=1 CCACHE_DIR=/builds/ccache CCACHE_UMASK=002 CVS_RSH=ssh DISPLAY=/tmp/launch-DIb19R/org.x:0 HG_SHARE_BASE_DIR=/builds/hg-shared HOME=/Users/cltbld LC_ALL=C LOGNAME=cltbld MOZ_CRASHREPORTER_NO_REPORT=1 MOZ_OBJDIR=obj-firefox MOZ_SIGN_CMD=python /builds/slave/m-in-osx64-dbg/tools/release/signing/signtool.py --cachedir /builds/slave/m-in-osx64-dbg/signing_cache -t /builds/slave/m-in-osx64-dbg/token -n /builds/slave/m-in-osx64-dbg/nonce -c /builds/slave/m-in-osx64-dbg/tools/release/signing/host.cert -H mac-signing3.build.scl1.mozilla.com:9100 -H mac-signing4.build.scl1.mozilla.com:9100 PATH=/tools/python/bin:/tools/buildbot/bin:/opt/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin PWD=/builds/slave/m-in-osx64-dbg/build PYTHONPATH=/tools/buildbot/lib/python2.7/site-packages SHELL=/bin/bash SSH_AUTH_SOCK=/tmp/launch-ckJ4IH/Listeners TMPDIR=/var/folders/30/yq_p3wk15yb9wdsv6sm_m0v00000gn/T/ USER=cltbld VERSIONER_PYTHON_PREFER_32_BIT=no VERSIONER_PYTHON_VERSION=2.7 XPCOM_DEBUG_BREAK=stack-and-abort __CF_USER_TEXT_ENCODING=0x1F5:0:0 using PTY: False obj-firefox/dist/bin/leakstats starting at Tue Jul 17 09:02:59 2012 obj-firefox/dist/bin/leakstats: no callsite for 'F' (70)! Unknown event type 0xffffff90 obj-firefox/dist/bin/leakstats: log file incomplete program finished with exit code 1 elapsedTime=0.245636 Unable to parse leakstats output ========= Finished compare current leak logs failed (results: 2, elapsed: 0 secs) (at 2012-07-17 09:02:59.506804) ========= }
Summary: OS X 10.7 build failure with: "obj-firefox/dist/bin/leakstats: no callsite for 'F' (70)! | Unknown event type 0xffffff90 | obj-firefox/dist/bin/leakstats: log file incomplete | program finished with exit code 1" → OS X 10.7 compare current leak logs failure with: "obj-firefox/dist/bin/leakstats: no callsite for 'F' (70)! | Unknown event type 0xffffff90 | obj-firefox/dist/bin/leakstats: log file incomplete | program finished with exit code 1"
https://tbpl.mozilla.org/php/getParsedLog.php?id=13761534&tree=Mozilla-Inbound { ========= Started compare current leak logs failed (results: 2, elapsed: 0 secs) (at 2012-07-22 19:33:16.858487) ========= obj-firefox/dist/bin/leakstats ../malloc.log in dir /builds/slave/m-in-osx64-dbg/build (timeout 1200 secs) watching logfiles {} argv: ['obj-firefox/dist/bin/leakstats', '../malloc.log'] environment: Apple_PubSub_Socket_Render=/tmp/launch-9hZd6s/Render CCACHE_BASEDIR=/builds/slave/m-in-osx64-dbg CCACHE_COMPRESS=1 CCACHE_DIR=/builds/ccache CCACHE_UMASK=002 CVS_RSH=ssh DISPLAY=/tmp/launch-ohFZM5/org.x:0 HG_SHARE_BASE_DIR=/builds/hg-shared HOME=/Users/cltbld LC_ALL=C LOGNAME=cltbld MOZ_CRASHREPORTER_NO_REPORT=1 MOZ_OBJDIR=obj-firefox MOZ_SIGN_CMD=python /builds/slave/m-in-osx64-dbg/tools/release/signing/signtool.py --cachedir /builds/slave/m-in-osx64-dbg/signing_cache -t /builds/slave/m-in-osx64-dbg/token -n /builds/slave/m-in-osx64-dbg/nonce -c /builds/slave/m-in-osx64-dbg/tools/release/signing/host.cert -H mac-signing3.build.scl1.mozilla.com:9100 -H mac-signing4.build.scl1.mozilla.com:9100 PATH=/tools/python/bin:/tools/buildbot/bin:/opt/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin PWD=/builds/slave/m-in-osx64-dbg/build PYTHONPATH=/tools/buildbot/lib/python2.7/site-packages SHELL=/bin/bash SSH_AUTH_SOCK=/tmp/launch-IMAG00/Listeners TMPDIR=/var/folders/30/yq_p3wk15yb9wdsv6sm_m0v00000gn/T/ USER=cltbld VERSIONER_PYTHON_PREFER_32_BIT=no VERSIONER_PYTHON_VERSION=2.7 XPCOM_DEBUG_BREAK=stack-and-abort __CF_USER_TEXT_ENCODING=0x1F5:0:0 using PTY: False obj-firefox/dist/bin/leakstats starting at Sun Jul 22 19:33:16 2012 Unknown event type 0xffffffe8 obj-firefox/dist/bin/leakstats: log file incomplete program finished with exit code 1 elapsedTime=0.174396 Unable to parse leakstats output ========= Finished compare current leak logs failed (results: 2, elapsed: 0 secs) (at 2012-07-22 19:33:17.055132) ========= }
Depends on: 539334
Whiteboard: [orange][red] → [red]
Depends on: 828946
Summary: OS X 10.7 compare current leak logs failure with: "obj-firefox/dist/bin/leakstats: no callsite for 'F' (70)! | Unknown event type 0xffffff90 | obj-firefox/dist/bin/leakstats: log file incomplete | program finished with exit code 1" → Intermittent OS X 10.7 leakstats | log file incomplete (after "obj-firefox/dist/bin/leakstats: no callsite for 'F' (70)! | Unknown event type 0xffffff90")
Note this doesn't appear on OrangeFactor due to bug 694170.
Please may you find someone to own this (is a top orange at the moment)? :-)
Flags: needinfo?(catlee)
dbaron, any ideas about this?
Flags: needinfo?(dbaron)
Flags: needinfo?(catlee)
Is there any chance these machines are running out of disk space while running the test? Do we know how large the files typically are? Given that the diffbloatdump output isn't functioning correctly, I suspect the answer might be that we should just stop running the compare-logs part of this, at least. (Though maybe it's working on other platforms.) Though the part that's failing isn't actually that part. (It used to show run-to-run differences of allocation stacks, so if a lot more memory were allocated in a particular stack than in the previous run, we'd show the stack for that allocation, in tree form. In other words, the point of the log comparison was that for leak regressions in this test, the log would actually often have enough data in the log to debug the problem without doing anything else.) Maybe chat with the memshrink folks, though; they might know if anybody is still using this. (Given that we've mostly stopped caring about not cleaning stuff up at shutdown, this has become a less useful test. It also had UI that worked better with tinderbox and doesn't work well with TBPL, so people stopped paying attention to it post tbpl-switch.)
Flags: needinfo?(dbaron) → needinfo?(khuey)
Except s/previous run/some other run/, since it stopped being a compare to anything other than "some random other run" the minute we got our second buildslave running on a platform, or at least within a few hours, the first time that one push did a clobber build while a later push did a dep build and finished before the previous push's build. I think last time we wanted to kill it, Standard8 and one other person, I forget who, spoke up on its behalf.
One of the costs of this failure which doesn't appear in the bug is the way it is dealt with on Try - nearly everyone retriggers the build, and thus also triggers twice as many tests (on all three OS X versions). An amusing, and very simple, way of dealing with it would be to just turn it into not-an-error, by taking out the TEST-UNEXPECTED-FAIL and returning 0. If we then completely break it without realizing it because it's silent, surely the people who rely on it working will notice, won't they?
Depends on: 887234