A host of segfaults on index runs from yesterday. Not limited to trees that compile. dxr-processor1, which was indexing build-central and l10n-mozilla-aurora (all text-only): Jul 20 12:53:09 dxr-processor1.dmz.scl3.mozilla.com kernel: clang: segfault at 0 ip 0000 2b84592ce356 sp 00007ffcd8a84f30 error 4 in libclang-index-plugin.so[2b84592b4000+1e000] Jul 20 12:53:09 dxr-processor1.dmz.scl3.mozilla.com abrt: Can't open /proc/4334/status: No such file or directory Jul 20 12:53:09 dxr-processor1.dmz.scl3.mozilla.com kernel: clang: segfault at 0 ip 0000 2b23dec24356 sp 00007ffc2bc3fdb0 error 4 in libclang-index-plugin.so[2b23dec0a000+1e000] Jul 20 12:53:09 dxr-processor1.dmz.scl3.mozilla.com abrt: Can't open /proc/4363/status: No such file or directory dxr-processor2, which was building mozilla-beta: Jul 20 18:29:45 dxr-processor2.dmz.scl3.mozilla.com kernel: clang: segfault at 0 ip 00002 ad7feaaa356 sp 00007ffc0e537090 error 4 in libclang-index-plugin.so[2ad7fea90000+1e000] Jul 20 18:29:46 dxr-processor2.dmz.scl3.mozilla.com abrt: Can't open /proc/1262/status: N o such file or directory Jul 20 18:29:46 dxr-processor2.dmz.scl3.mozilla.com kernel: clang: segfault at 0 ip 00002 b3e03260356 sp 00007fff9477aa10 error 4 in libclang-index-plugin.so[2b3e03246000+1e000] Jul 20 18:29:46 dxr-processor2.dmz.scl3.mozilla.com abrt: Can't open /proc/1276/status: N o such file or directory Jul 20 18:29:46 dxr-processor2.dmz.scl3.mozilla.com kernel: clang: segfault at 0 ip 00002 b5cd2f88356 sp 00007fff05796100 error 4 in libclang-index-plugin.so[2b5cd2f6e000+1e000] Jul 20 18:29:46 dxr-processor2.dmz.scl3.mozilla.com abrt: Can't open /proc/1427/status: N o such file or directory
I've seen this error before if I upgrade clang and then do not re-run "make clean all" in dxr to recompile libclang-plugin.so. Just running make isn't enough because it will skip the file if it exists already, even if it's the wrong version.
Worth a try. Though if just `make` is doing the wrong thing, there's something wrong with our makefile. Make should compare the mod dates on the source code and build artifacts and rebuild if the latter is older.
I haven't (yet) upgraded clang, and the docker images shouldn't have a previous build of DXR on them, so that sounds like a red herring?
build-central has a blank build_command (https://github.com/mozilla-platform-ops/dxr-infra/blob/master/dxr.config#L354), so it shouldn't even be running clang. This is puzzling.
l10n-mozilla-aurora-tree has never had a failure, according to Jenkins, so it's build-central we should be examining. It looks like the crash happened during indexing, not during building (https://jenkins-dxr.mozilla.org/job/build-central-tree/110/console), which is even more puzzling. Clang doesn't run then, even if it was supposed to run at all.
it may or may not be helpful to go back and look at runs from over there weekend, as there were also a lot of segfaults on the 17-18th, and correlate timestamps in /var/log/messages with what ever ran.
It looks like any failures of those trees around the 17th or 18th have aged out of Jenkins.
If we can pinpoint a tree that reliably segfaults, I can try to repro it locally and debug this. With luck, it'll be one that doesn't take 21 hours to build. :-/
moz-aurora segfaults (on dxr-proc2) pretty quickly after the build starts, says fubar 7/21 at 15:43. I shall try to repro that locally.
And after updating to clang 3.6 on prod, the segfaults have (so far) gone away.
Built m-c late on Friday and it compiled successfully. However, indexing never finished. Last I saw it was still on 99% many hours after compilation finished and when indexing should have completed. I wonder if this new behavior is related to the increase in index runs never finishing (e.g incubator-central is currently at 2d22hr, labs-central at 17h (used to take 10 minutes!!))
segfaults are gone, filed bug 1289411 to track the issue with indexing taking forever.