ASAN builds fail with clang 5.0

RESOLVED FIXED in Firefox 58

Status

defect
P3
normal
RESOLVED FIXED
2 years ago
Last year

People

(Reporter: glandium, Assigned: glandium)

Tracking

unspecified
mozilla58
Dependency tree / graph

Firefox Tracking Flags

(firefox58 fixed)

Details

Attachments

(1 attachment)

https://treeherder.mozilla.org/logviewer.html#?job_id=136732246&repo=try&lineNumber=27705

[task 2017-10-13T07:09:00.452Z] 07:09:00     INFO -  Executing /builds/worker/workspace/build/src/obj-firefox/dist/bin/shlibsign -v -o ../../dist/firefox/libsoftokn3.chk -i ../../dist/firefox/libsoftokn3.so
[task 2017-10-13T07:09:00.453Z] 07:09:00     INFO -  Library File: ../../dist/firefox/libsoftokn3.so 934664 bytes
[task 2017-10-13T07:09:00.453Z] 07:09:00     INFO -  Check File: ../../dist/firefox/libsoftokn3.chk
[task 2017-10-13T07:09:00.453Z] 07:09:00     INFO -    hash: 32 bytes
[task 2017-10-13T07:09:00.453Z] 07:09:00     INFO -      48 ee d5 a0 35 01 7f 44 bf a1
[task 2017-10-13T07:09:00.454Z] 07:09:00     INFO -      a0 b7 3a ce f3 86 7b 1f 70 04
[task 2017-10-13T07:09:00.454Z] 07:09:00     INFO -      38 95 0c 2c f5 76 10 b7 d5 8d
[task 2017-10-13T07:09:00.454Z] 07:09:00     INFO -      d9 a1
[task 2017-10-13T07:09:00.454Z] 07:09:00     INFO -    signature: 64 bytes
[task 2017-10-13T07:09:00.454Z] 07:09:00     INFO -      3a 65 c3 ab 34 74 ba 8e d1 d8
[task 2017-10-13T07:09:00.455Z] 07:09:00     INFO -      75 db 1b 38 95 9b ad 16 79 fa
[task 2017-10-13T07:09:00.455Z] 07:09:00     INFO -      ab e9 f4 77 1e ed 90 e4 00 24
[task 2017-10-13T07:09:00.455Z] 07:09:00     INFO -      61 de 0b e9 8a fc 24 52 98 11
[task 2017-10-13T07:09:00.455Z] 07:09:00     INFO -      a2 7e 4e 70 58 0a f5 25 6e 24
[task 2017-10-13T07:09:00.456Z] 07:09:00     INFO -      d7 1e 58 eb 05 08 83 a7 13 8c
[task 2017-10-13T07:09:00.456Z] 07:09:00     INFO -      f3 c1 bf 0c
[task 2017-10-13T07:09:00.456Z] 07:09:00     INFO -  ==36118==LeakSanitizer has encountered a fatal error.
[task 2017-10-13T07:09:00.456Z] 07:09:00     INFO -  ==36118==HINT: For debugging, try setting environment variable LSAN_OPTIONS=verbosity=1:log_threads=1
[task 2017-10-13T07:09:00.456Z] 07:09:00     INFO -  ==36118==HINT: LeakSanitizer does not work under ptrace (strace, gdb, etc)
[task 2017-10-13T07:09:00.456Z] 07:09:00     INFO -  Traceback (most recent call last):
[task 2017-10-13T07:09:00.457Z] 07:09:00     INFO -    File "/builds/worker/workspace/build/src/toolkit/mozapps/installer/packager.py", line 333, in <module>
[task 2017-10-13T07:09:00.457Z] 07:09:00     INFO -      main()
[task 2017-10-13T07:09:00.457Z] 07:09:00     INFO -    File "/builds/worker/workspace/build/src/toolkit/mozapps/installer/packager.py", line 327, in main
[task 2017-10-13T07:09:00.457Z] 07:09:00     INFO -      copier.copy(args.destination)
[task 2017-10-13T07:09:00.457Z] 07:09:00     INFO -    File "/builds/worker/workspace/build/src/python/mozbuild/mozpack/copier.py", line 431, in copy
[task 2017-10-13T07:09:00.458Z] 07:09:00     INFO -      copy_results.append((destfile, f.copy(destfile, skip_if_older)))
[task 2017-10-13T07:09:00.458Z] 07:09:00     INFO -    File "/builds/worker/workspace/build/src/toolkit/mozapps/installer/packager.py", line 125, in copy
[task 2017-10-13T07:09:00.458Z] 07:09:00     INFO -      errors.fatal('Error while signing %s' % self.path)
[task 2017-10-13T07:09:00.458Z] 07:09:00     INFO -    File "/builds/worker/workspace/build/src/python/mozbuild/mozpack/errors.py", line 103, in fatal
[task 2017-10-13T07:09:00.458Z] 07:09:00     INFO -      self._handle(self.FATAL, msg)
[task 2017-10-13T07:09:00.458Z] 07:09:00     INFO -    File "/builds/worker/workspace/build/src/python/mozbuild/mozpack/errors.py", line 98, in _handle
[task 2017-10-13T07:09:00.459Z] 07:09:00     INFO -      raise ErrorMessage(msg)
[task 2017-10-13T07:09:00.459Z] 07:09:00     INFO -  mozpack.errors.ErrorMessage: Error: Error while signing ../../dist/firefox/libsoftokn3.so
Setting the suggested LSAN_OPTIONS only yields:

[task 2017-10-17T06:54:14.055Z] 06:54:14     INFO -  Executing /builds/worker/workspace/build/src/obj-firefox/dist/bin/shlibsign -v -o ../../dist/firefox/libsoftokn3.chk -i ../../dist/firefox/libsoftokn3.so
[task 2017-10-17T06:54:14.055Z] 06:54:14     INFO -  ==36334==AddressSanitizer: failed to intercept '__isoc99_printf'
[task 2017-10-17T06:54:14.056Z] 06:54:14     INFO -  ==36334==AddressSanitizer: failed to intercept '__isoc99_sprintf'
[task 2017-10-17T06:54:14.056Z] 06:54:14     INFO -  ==36334==AddressSanitizer: failed to intercept '__isoc99_snprintf'
[task 2017-10-17T06:54:14.056Z] 06:54:14     INFO -  ==36334==AddressSanitizer: failed to intercept '__isoc99_fprintf'
[task 2017-10-17T06:54:14.056Z] 06:54:14     INFO -  ==36334==AddressSanitizer: failed to intercept '__isoc99_vprintf'
[task 2017-10-17T06:54:14.057Z] 06:54:14     INFO -  ==36334==AddressSanitizer: failed to intercept '__isoc99_vsprintf'
[task 2017-10-17T06:54:14.057Z] 06:54:14     INFO -  ==36334==AddressSanitizer: failed to intercept '__isoc99_vsnprintf'
[task 2017-10-17T06:54:14.057Z] 06:54:14     INFO -  ==36334==AddressSanitizer: failed to intercept '__isoc99_vfprintf'
[task 2017-10-17T06:54:14.057Z] 06:54:14     INFO -  ==36334==AddressSanitizer: failed to intercept 'process_vm_readv'
[task 2017-10-17T06:54:14.058Z] 06:54:14     INFO -  ==36334==AddressSanitizer: failed to intercept 'process_vm_writev'
[task 2017-10-17T06:54:14.058Z] 06:54:14     INFO -  ==36334==AddressSanitizer: libc interceptors initialized
[task 2017-10-17T06:54:14.058Z] 06:54:14     INFO -  || `[0x10007fff8000, 0x7fffffffffff]` || HighMem    ||
[task 2017-10-17T06:54:14.058Z] 06:54:14     INFO -  || `[0x02008fff7000, 0x10007fff7fff]` || HighShadow ||
[task 2017-10-17T06:54:14.059Z] 06:54:14     INFO -  || `[0x00008fff7000, 0x02008fff6fff]` || ShadowGap  ||
[task 2017-10-17T06:54:14.059Z] 06:54:14     INFO -  || `[0x00007fff8000, 0x00008fff6fff]` || LowShadow  ||
[task 2017-10-17T06:54:14.059Z] 06:54:14     INFO -  || `[0x000000000000, 0x00007fff7fff]` || LowMem     ||
[task 2017-10-17T06:54:14.059Z] 06:54:14     INFO -  MemToShadow(shadow): 0x00008fff7000 0x000091ff6dff 0x004091ff6e00 0x02008fff6fff
[task 2017-10-17T06:54:14.059Z] 06:54:14     INFO -  redzone=16
[task 2017-10-17T06:54:14.060Z] 06:54:14     INFO -  max_redzone=2048
[task 2017-10-17T06:54:14.060Z] 06:54:14     INFO -  quarantine_size_mb=256M
[task 2017-10-17T06:54:14.060Z] 06:54:14     INFO -  thread_local_quarantine_size_kb=1024K
[task 2017-10-17T06:54:14.060Z] 06:54:14     INFO -  malloc_context_size=30
[task 2017-10-17T06:54:14.060Z] 06:54:14     INFO -  SHADOW_SCALE: 3
[task 2017-10-17T06:54:14.060Z] 06:54:14     INFO -  SHADOW_GRANULARITY: 8
[task 2017-10-17T06:54:14.061Z] 06:54:14     INFO -  SHADOW_OFFSET: 0x7fff8000
[task 2017-10-17T06:54:14.061Z] 06:54:14     INFO -  ==36334==Installed the sigaction for signal 11
[task 2017-10-17T06:54:14.061Z] 06:54:14     INFO -  ==36334==Installed the sigaction for signal 7
[task 2017-10-17T06:54:14.061Z] 06:54:14     INFO -  ==36334==Installed the sigaction for signal 8
[task 2017-10-17T06:54:14.061Z] 06:54:14     INFO -  ==36334==T0: stack [0x7ffd91a76000,0x7ffd92276000) size 0x800000; local=0x7ffd92273f9c
[task 2017-10-17T06:54:14.062Z] 06:54:14     INFO -  ==36334==AddressSanitizer Init done
[task 2017-10-17T06:54:14.062Z] 06:54:14     INFO -  Library File: ../../dist/firefox/libsoftokn3.so 1579360 bytes
[task 2017-10-17T06:54:14.062Z] 06:54:14     INFO -  Check File: ../../dist/firefox/libsoftokn3.chk
[task 2017-10-17T06:54:14.062Z] 06:54:14     INFO -    hash: 32 bytes
[task 2017-10-17T06:54:14.062Z] 06:54:14     INFO -      0b 65 16 cc 1d 3f 5e 0b 3d ba
[task 2017-10-17T06:54:14.063Z] 06:54:14     INFO -      6b eb 67 e3 28 f6 10 a2 98 e0
[task 2017-10-17T06:54:14.063Z] 06:54:14     INFO -      9b 04 b0 ae 3c 62 8b 1d 6b 5f
[task 2017-10-17T06:54:14.063Z] 06:54:14     INFO -      89 54
[task 2017-10-17T06:54:14.063Z] 06:54:14     INFO -    signature: 64 bytes
[task 2017-10-17T06:54:14.063Z] 06:54:14     INFO -      17 ac 2b 35 63 00 c4 bc 53 e6
[task 2017-10-17T06:54:14.063Z] 06:54:14     INFO -      91 33 92 3c c0 78 35 e7 1b c6
[task 2017-10-17T06:54:14.064Z] 06:54:14     INFO -      71 df 5e 9c e5 c6 e4 04 d4 d0
[task 2017-10-17T06:54:14.064Z] 06:54:14     INFO -      ca fc 77 c7 14 a8 4c bb 08 3d
[task 2017-10-17T06:54:14.064Z] 06:54:14     INFO -      38 f1 1b da a1 d7 c3 ba cb e2
[task 2017-10-17T06:54:14.064Z] 06:54:14     INFO -      62 99 fb 7d a0 60 a6 29 db a6
[task 2017-10-17T06:54:14.064Z] 06:54:14     INFO -      17 b5 31 64
[task 2017-10-17T06:54:14.065Z] 06:54:14     INFO -  ==36335==Could not attach to thread 36334 (errno 1).
[task 2017-10-17T06:54:14.065Z] 06:54:14     INFO -  ==36335==Failed suspending threads.

(most likely because of the limitations on ptrace on taskcluster) :(
Assignee: nobody → nobody
Component: Build Config → Build
Product: Core → NSS
Version: unspecified → other
We're using clang 5.0 on OSS-Fuzz, as well as on TC with ASan+UBSan for fuzzing [1]. Both are Docker-based. On TC we had to enable the 'allowPtrace' feature [2] to make this work. Did you try that?

[1] https://searchfox.org/nss/source/automation/taskcluster/docker-fuzz/setup.sh#45
(the comment is old, we're just using the latest version there)

[2] https://searchfox.org/nss/source/automation/taskcluster/graph/src/extend.js#495
Tried again with allowPtrace:
https://tools.taskcluster.net/groups/LLtRPq0vSByMsFvxZat2ig/tasks/Px3fUGMzQY2VLrO0bs76Kw/details
Still failed similarly, with the same "Could not attach to thread x" message.
https://queue.taskcluster.net/v1/task/Px3fUGMzQY2VLrO0bs76Kw/runs/0/artifacts/public/logs/live_backing.log

Note that one was with clang 4, which fails too.

Another with clang 5 and allowPtrace here:
https://queue.taskcluster.net/v1/task/UwzBCvnSQfWSbZ6bYZuxjQ/runs/0/artifacts/public/logs/live_backing.log
We fail when running shlibsign, I guess we don't need that for ASan builds anyway? We have -Dsign_libs=0 set for our fuzzing builds. IIRC we ran into similar issues, or TC limitations, and decided to just turn that off.
This is the same failure as bug 1377561 and it is not specific to Clang 5.0. Our version 0f 4.0.1 on TC also suffers from the same problem. For the fuzzing build (which uses 4.0.1) we circumvented this by disabling LSan (which is not a great option for the general ASan builds).

As mentioned in the other bug, I could reproduce this locally, but only with the version we have on TC. The Ubuntu 4.0 and 5.0 versions from the LLVM repos do not reproduce this.
Mike, can you tell me where/how to download the Clang 5 toolchain build you used for this? I couldn't find it on TC. Thanks!
Flags: needinfo?(mh+mozilla)
Nevermind the needinfo, I found it already, but I also managed to reproduce it locally with clang 4 from TC.

(In reply to Mike Hommey [:glandium] from comment #1)
> [task 2017-10-17T06:54:14.065Z] 06:54:14     INFO -  ==36335==Could not
> attach to thread 36334 (errno 1).
> [task 2017-10-17T06:54:14.065Z] 06:54:14     INFO -  ==36335==Failed
> suspending threads.

The exact same thing also happens locally on my normal machine (Ubuntu 16.04 LTS). If set /proc/sys/kernel/yama/ptrace_scope to 0, the error goes away. That option should normally only affect ptrace on non-child processes, right?

dmesg also shows the error:

> [45196470.781634] ptrace of pid 16494 was attempted by: shlibsign (pid 16495)

I'm still trying to find a minimal test for this now. If anyone has an idea, please let me know :)
Flags: needinfo?(mh+mozilla)
Build artifacts expire quickly on try, you'd need to do another push to get one.
I think I found something that might explain this. I confirmed two straces (I know LSan doesn't work under strace) and saw that the binary compiled with my local Clangs perform an additional syscall that the binary produced with Clang from TC does not:

> prctl(PR_SET_PTRACER, 9434)             = 0

I traced this prctl call to this code in LSan's runtime:

>    // On some systems we have to explicitly declare that we want to be traced
>    // by the tracer thread.
>#ifdef PR_SET_PTRACER
>    internal_prctl(PR_SET_PTRACER, tracer_pid, 0, 0, 0);
>#endif


Could it be that this code is not included in the Clang build on TC? The Clang I downloaded from RelEng is 4.0.1 (which also had exactly the same problem). It would explain the symptoms we see here.

Mike, any idea about this?
Flags: needinfo?(mh+mozilla)
PR_SET_PTRACER is new to Linux kernel 3.4, and the headers we have on the build machines are those of kernel 2.6.32. Even after bug 1399679, they would still be those of kernel 3.2.

But the builds *do* run on kernel 3.13, meaning it's available at runtime. So if we could define PR_SET_PTRACER to the right value (0x59616d61) somehow, that code would be enabled. It seems like patching that file to have

#ifndef PR_SET_PTRACER
#define PR_SET_PTRACER 0x59616d61
#endif

would be the right thing to do and upstream.
Flags: needinfo?(mh+mozilla)
Assignee: nobody → mh+mozilla
Component: Build → Build Config
Product: NSS → Core
Version: other → unspecified
Blocks: 1415085
No longer blocks: 1415085
Depends on: 1415083
Blocks: 1415083
No longer depends on: 1415083
Comment on attachment 8925833 [details]
Bug 1409267 - Patch compiler-rt to always use prctl(PR_SET_PTRACER).

https://reviewboard.mozilla.org/r/197026/#review202258

r=me with nit addressed; feel free to rephrase the comment or not as you see fit.

::: build/build-clang/pr_set_ptracer.patch:7
(Diff revision 1)
> ++// Sufficiently old kernel header don't provide this value, but we can still
> ++// call prctl with it if the runtime kernel is newer. Even if it's not, prctl
> ++// will only return EINVAL.

Nit: "old kernel headers"

WDYT about rephrasing this to:

"Sufficiently old kernel headers don't provide this value, but we can still call prctl with it.  If the runtime kernel is new enough, the prctl call will have the desired effect; if the kernel's too old, the call will error and we can ignore said error."

The "call prctl with it if the runtime kernel is newer.  Even if it's not..." phrasing is a little strange, since we're saying we can only call prctl with the value on a new enough kernel, but then we turn around and contradict ourselves.
Attachment #8925833 - Flags: review?(nfroyd) → review+
Pushed by mh@glandium.org:
https://hg.mozilla.org/integration/autoland/rev/b2dd86a66344
Patch compiler-rt to always use prctl(PR_SET_PTRACER).  r=froydnj
Patch landed upstream: https://reviews.llvm.org/rL317668
https://hg.mozilla.org/mozilla-central/rev/b2dd86a66344
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.