Status

Webtools
DXR
RESOLVED FIXED
a year ago
a year ago

People

(Reporter: fubar, Assigned: erik)

Tracking

Trunk

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

a year ago
A host of segfaults on index runs from yesterday. Not limited to trees that compile.

dxr-processor1, which was indexing build-central and l10n-mozilla-aurora (all text-only):

Jul 20 12:53:09 dxr-processor1.dmz.scl3.mozilla.com kernel: clang[23460]: segfault at 0 ip 0000
2b84592ce356 sp 00007ffcd8a84f30 error 4 in libclang-index-plugin.so[2b84592b4000+1e000]
Jul 20 12:53:09 dxr-processor1.dmz.scl3.mozilla.com abrt[23461]: Can't open /proc/4334/status:
No such file or directory
Jul 20 12:53:09 dxr-processor1.dmz.scl3.mozilla.com kernel: clang[23490]: segfault at 0 ip 0000
2b23dec24356 sp 00007ffc2bc3fdb0 error 4 in libclang-index-plugin.so[2b23dec0a000+1e000]
Jul 20 12:53:09 dxr-processor1.dmz.scl3.mozilla.com abrt[23491]: Can't open /proc/4363/status:
No such file or directory

dxr-processor2, which was building mozilla-beta:

Jul 20 18:29:45 dxr-processor2.dmz.scl3.mozilla.com kernel: clang[4048]: segfault at 0 ip 00002
ad7feaaa356 sp 00007ffc0e537090 error 4 in libclang-index-plugin.so[2ad7fea90000+1e000]
Jul 20 18:29:46 dxr-processor2.dmz.scl3.mozilla.com abrt[4049]: Can't open /proc/1262/status: N
o such file or directory
Jul 20 18:29:46 dxr-processor2.dmz.scl3.mozilla.com kernel: clang[4063]: segfault at 0 ip 00002
b3e03260356 sp 00007fff9477aa10 error 4 in libclang-index-plugin.so[2b3e03246000+1e000]
Jul 20 18:29:46 dxr-processor2.dmz.scl3.mozilla.com abrt[4064]: Can't open /proc/1276/status: N
o such file or directory
Jul 20 18:29:46 dxr-processor2.dmz.scl3.mozilla.com kernel: clang[4215]: segfault at 0 ip 00002
b5cd2f88356 sp 00007fff05796100 error 4 in libclang-index-plugin.so[2b5cd2f6e000+1e000]
Jul 20 18:29:46 dxr-processor2.dmz.scl3.mozilla.com abrt[4216]: Can't open /proc/1427/status: N
o such file or directory
I've seen this error before if I upgrade clang and then do not re-run "make clean all" in dxr to recompile libclang-plugin.so. Just running make isn't enough because it will skip the file if it exists already, even if it's the wrong version.
(Assignee)

Comment 2

a year ago
Worth a try. Though if just `make` is doing the wrong thing, there's something wrong with our makefile. Make should compare the mod dates on the source code and build artifacts and rebuild if the latter is older.
(Reporter)

Comment 3

a year ago
I haven't (yet) upgraded clang, and the docker images shouldn't have a previous build of DXR on them, so that sounds like a red herring?
(Assignee)

Comment 4

a year ago
build-central has a blank build_command (https://github.com/mozilla-platform-ops/dxr-infra/blob/master/dxr.config#L354), so it shouldn't even be running clang. This is puzzling.
(Assignee)

Comment 5

a year ago
l10n-mozilla-aurora-tree has never had a failure, according to Jenkins, so it's build-central we should be examining. It looks like the crash happened during indexing, not during building (https://jenkins-dxr.mozilla.org/job/build-central-tree/110/console), which is even more puzzling. Clang doesn't run then, even if it was supposed to run at all.
(Reporter)

Comment 6

a year ago
it may or may not be helpful to go back and look at runs from over there weekend, as there were also a lot of segfaults on the 17-18th, and correlate timestamps in /var/log/messages with what ever ran.
(Assignee)

Comment 7

a year ago
It looks like any failures of those trees around the 17th or 18th have aged out of Jenkins.
(Assignee)

Comment 8

a year ago
If we can pinpoint a tree that reliably segfaults, I can try to repro it locally and debug this. With luck, it'll be one that doesn't take 21 hours to build. :-/
(Assignee)

Comment 9

a year ago
moz-aurora segfaults (on dxr-proc2) pretty quickly after the build starts, says fubar 7/21 at 15:43. I shall try to repro that locally.
Assignee: nobody → erik
(Assignee)

Comment 10

a year ago
And after updating to clang 3.6 on prod, the segfaults have (so far) gone away.
(Reporter)

Comment 11

a year ago
Built m-c late on Friday and it compiled successfully. However, indexing never finished. Last I saw it was still on 99% many hours after compilation finished and when indexing should have completed. I wonder if this new behavior is related to the increase in index runs never finishing (e.g incubator-central is currently at 2d22hr, labs-central at 17h (used to take 10 minutes!!))
(Reporter)

Comment 12

a year ago
segfaults are gone, filed bug 1289411 to track the issue with indexing taking forever.
Status: NEW → RESOLVED
Last Resolved: a year ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.