Closed Bug 942678 Opened 11 years ago Closed 10 years ago

Intermittent ASAN mochitest timeouts from a heap-buffer-overflow in _PR_SetLogModuleLevel

Categories

(Toolkit :: about:memory, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: philor, Unassigned)

References

Details

(Keywords: intermittent-failure)

https://tbpl.mozilla.org/php/getParsedLog.php?id=31029670&tree=Mozilla-Inbound
Ubuntu ASAN VM 12.04 x64 mozilla-inbound opt test mochitest-other on 2013-11-24 18:53:31 PST for push 18eadd057bee
slave: tst-linux64-ec2-090

19:07:28     INFO -  17109 INFO TEST-START | chrome://mochitests/content/chrome/toolkit/components/aboutmemory/tests/test_aboutmemory5.xul
19:07:29     INFO -  =================================================================
19:07:29     INFO -  ==2754==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7fff6592aef9 at pc 0x7f2547c89d5f bp 0x7fff65929bf0 sp 0x7fff65929be8
19:07:29     INFO -  READ of size 1 at 0x7fff6592aef9 thread T0
19:07:29     INFO -      #0 0x7f2547c89d5e in _PR_SetLogModuleLevel /builds/slave/m-in-l64-asan-0000000000000000/build/nsprpub/pr/src/io/prlog.c:315
19:07:29     INFO -      #1 0x7f2547c89d5e in PR_NewLogModule /builds/slave/m-in-l64-asan-0000000000000000/build/nsprpub/pr/src/io/prlog.c:360
19:07:29     INFO -      #2 0x7f2547ca615b in _PR_InitStuff /builds/slave/m-in-l64-asan-0000000000000000/build/nsprpub/pr/src/misc/prinit.c:153
19:07:29     INFO -      #3 0x7f2547ca615b in _PR_ImplicitInitialization /builds/slave/m-in-l64-asan-0000000000000000/build/nsprpub/pr/src/misc/prinit.c:219
19:07:29     INFO -      #4 0x7f2547c95ab1 in PR_NewRWLock /builds/slave/m-in-l64-asan-0000000000000000/build/nsprpub/pr/src/threads/prrwlock.c:104
19:07:30     INFO -  Xlib:  extension "RANDR" missing on display ":0".
19:07:32     INFO -  Xlib:  extension "RANDR" missing on display ":0".
19:07:32     INFO -      #5 0x7f25490ef53a in __cxx_global_var_init /builds/slave/m-in-l64-asan-0000000000000000/build/media/webrtc/signaling/src/common/browser_logging/CSFLog.cpp:61
19:07:32     INFO -      #6 0x7f25490ef53a in global constructors keyed to a /builds/slave/m-in-l64-asan-0000000000000000/build/media/webrtc/signaling/./src/common/browser_logging/CSFLog.cpp:76
19:07:32     INFO -      #7 0x7f255317a305 (/lib64/ld-linux-x86-64.so.2+0xf305)
19:07:32     INFO -      #8 0x7f255317a3de (/lib64/ld-linux-x86-64.so.2+0xf3de)
19:07:32     INFO -      #9 0x7f255316c6e9 (/lib64/ld-linux-x86-64.so.2+0x16e9)
19:07:32     INFO -  Address 0x7fff6592aef9 is located in stack of thread T0 at offset 4793 in frame
19:07:32     INFO -      #0 0x7f2547c8997f in PR_NewLogModule /builds/slave/m-in-l64-asan-0000000000000000/build/nsprpub/pr/src/io/prlog.c:349
19:07:32     INFO -    This frame has 3 object(s):
19:07:32     INFO -      [32, 96) 'module.i'
19:07:32     INFO -      [128, 132) 'level.i'
19:07:32     INFO -      [192, 196) 'delta.i'
19:07:32     INFO -  HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext
19:07:32     INFO -        (longjmp and C++ exceptions *are* supported)
19:07:32     INFO -  SUMMARY: AddressSanitizer: heap-buffer-overflow /builds/slave/m-in-l64-asan-0000000000000000/build/nsprpub/pr/src/io/prlog.c:315 _PR_SetLogModuleLevel
19:07:32     INFO -  Shadow bytes around the buggy address:
19:07:32     INFO -    0x10006cb1d580: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
19:07:32     INFO -    0x10006cb1d590: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
19:07:32     INFO -    0x10006cb1d5a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
19:07:32     INFO -    0x10006cb1d5b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
19:07:32     INFO -    0x10006cb1d5c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
19:07:32     INFO -  =>0x10006cb1d5d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa[fa]
19:07:32     INFO -    0x10006cb1d5e0:fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
19:07:32     INFO -    0x10006cb1d5f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
19:07:32     INFO -    0x10006cb1d600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
19:07:32     INFO -    0x10006cb1d610: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
19:07:32     INFO -    0x10006cb1d620: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
19:07:32     INFO -  Shadow byte legend (one shadow byte represents 8 application bytes):
19:07:32     INFO -    Addressable:           00
19:07:32     INFO -    Partially addressable: 01 02 03 04 05 06 07
19:07:32     INFO -    Heap left redzone:     fa
19:07:32     INFO -    Heap right redzone:    fb
19:07:32     INFO -    Freed heap region:     fd
19:07:32     INFO -    Stack left redzone:    f1
19:07:32     INFO -    Stack mid redzone:     f2
19:07:32     INFO -    Stack right redzone:   f3
19:07:32     INFO -    Stack partial redzone: f4
19:07:32     INFO -    Stack after return:    f5
19:07:32     INFO -    Stack use after scope: f8
19:07:32     INFO -    Global redzone:        f9
19:07:32     INFO -    Global init order:     f6
19:07:32     INFO -    Poisoned by user:      f7
19:07:32     INFO -    ASan internal:         fe
19:07:32     INFO -  ==2754==ABORTING
19:07:32     INFO -  [Parent 2420] WARNING: pipe error (74): Connection reset by peer: file /builds/slave/m-in-l64-asan-0000000000000000/build/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 446
19:07:32     INFO -  ###!!! [Parent][MessageChannel] Error: Channel error: cannot send/recv
19:07:32     INFO -  [Parent 2420] WARNING: waitpid failed pid:2754 errno:10: file /builds/slave/m-in-l64-asan-0000000000000000/build/ipc/chromium/src/base/process_util_posix.cc, line 254
19:07:34     INFO -  [Parent 2420] WARNING: waitpid failed pid:2754 errno:10: file /builds/slave/m-in-l64-asan-0000000000000000/build/ipc/chromium/src/base/process_util_posix.cc, line 254
19:07:34     INFO -  [Parent 2420] WARNING: Failed to deliver SIGKILL to 2754!(3).: file /builds/slave/m-in-l64-asan-0000000000000000/build/ipc/chromium/src/chrome/common/process_watcher_posix_sigchld.cc, line 118
19:12:36     INFO -  Xlib:  extension "RANDR" missing on display ":0".
(screenshot)
19:12:39     INFO -  17110 ERROR TEST-UNEXPECTED-FAIL | chrome://mochitests/content/chrome/toolkit/components/aboutmemory/tests/test_aboutmemory5.xul | Test timed out.
19:12:39     INFO -  17111 INFO TEST-INFO | MEMORY STAT vsize after test: 21991730094080
19:12:39     INFO -  17112 INFO TEST-INFO | MEMORY STAT residentFast after test: 964825088
19:12:39     INFO -  17113 INFO TEST-END | chrome://mochitests/content/chrome/toolkit/components/aboutmemory/tests/test_aboutmemory5.xul | finished in 308727ms
prlog.c:315 looks like this:

    ev = PR_GetEnv("NSPR_LOG_MODULES");
    if (ev && ev[0]) {

which suggests that something is screwy with the environment... the "fa" elements indicate that we're 32 bytes before the start of a legitimate heap block.
Note that this is when processing CSFLog.cpp:61, which is:

static PRRWLock *maplock = PR_NewRWLock(0,"thread map");

Seems very odd that this would fail (see njn's comment about PR_GetEnv(), which implies it returned a bad pointer).  Benjamin?
Flags: needinfo?(benjamin)
Isn't "environ" data stored on the stack of the main thread on Linux? That might explain "Address 0x7fff6592aef9 is located in stack of thread T0 at offset 4793 in frame" but it doesn't explain why ASAN has it marked "fa" instead of one of the stack redzones.

I wonder if the static constructor is confusing ASAN because of constructor ordering.

I'm surprised that we have a static PRRWLock, especially given the care we're taking to *not* have static PRLogModules in that same file (CSFLog.cpp).
Flags: needinfo?(benjamin)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #3)
> Isn't "environ" data stored on the stack of the main thread on Linux?

Just above the stack, as per figure 2 in http://asm.sourceforge.net/articles/startup.html:

 code           .text section
 data           .data section
 bss            .bss section
 ...
 ...
 ...            free space
 stack	        stack
 arguments      program arguments
 environment    program environment
 program name   filename of program (duplicated in arguments section)
 null (dword)   final dword of zero

> That
> might explain "Address 0x7fff6592aef9 is located in stack of thread T0 at
> offset 4793 in frame" but it doesn't explain why ASAN has it marked "fa"
> instead of one of the stack redzones.

But stack redzones aren't involved -- 'fa' is for a *heap* redzone.  Which suggests that NSPR_LOG_MODULES has been modified with setenv() or putenv() and its chars are (or were) on the heap.

Actually, I'm confused as to the address of the bad read.  We have this:

 19:07:29     INFO -  ==2754==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7fff6592aef9 at pc 0x7f2547c89d5f bp 0x7fff65929bf0 sp 0x7fff65929be8
 19:07:29     INFO -  READ of size 1 at 0x7fff6592aef9 thread T0

but also this:

 19:07:32     INFO -  Shadow bytes around the buggy address:
 ...
 19:07:32     INFO -  =>0x10006cb1d5d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa[fa]

So which is it -- 0x7fff6592aef9 or 0x10006cb1d5d0?
Philor: has this ever happened again, or was it just the once?
Flags: needinfo?(philringnalda)
If I knew that it had happened again, you would know because there would be a link to another log pasted in here, but the days when I looked at 100% of the test failures are long gone.
Flags: needinfo?(philringnalda)
njn or bsmedberg, is there more to investigate here or should we close this as incomplete?
I don't think there's anything that can be done here.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → INCOMPLETE
Blocks: 965903
https://tbpl.mozilla.org/php/getParsedLog.php?id=34271054&tree=Fx-Team

As evidenced by bug 965903 and the above log, we're still hitting this in automation. Reopening.
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Summary: Intermittent ASan test_aboutmemory5.xul | Test timed out by which it actually means AddressSanitizer: heap-buffer-overflow → Intermittent ASAN mochitest timeouts from a heap-buffer-overflow in _PR_SetLogModuleLevel
https://tbpl.mozilla.org/php/getParsedLog.php?id=34284543&tree=B2g-Inbound

I'm guessing this actually happens more often than this bug indicates since they show up as generic timeouts and probably get starred as such.
It would be good if TBPL or whatever recognized ASAN errors.  They are all fairly regular.  Like in comment 11, the log has:

01:34:46     INFO -  ==3333==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7fff036dfef9 at pc 0x7fcc6318dedf bp 0x7fff036de890 sp 0x7fff036de888
File a Webtools::Tinderboxpushlog bug for it. Shouldn't be too hard to tweak the log parser to recognize them better.
> File a Webtools::Tinderboxpushlog bug for it. Shouldn't be too hard to tweak
> the log parser to recognize them better.

See bug 957410 which did much the same thing for Valgrind. Having said that, highlighting Valgrind's "ERROR SUMMARY" line wasn't that good because different Valgrind errors all get lumped together, so I ended up doing bug 966864 which parses Valgrind's output.
Ryan, is this still happening as far you as can recall?  If not, I think it is worth just closing, as the investigation didn't seem to reveal anything in particular.
Flags: needinfo?(ryanvm)
Haven't seen it, no.
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Flags: needinfo?(ryanvm)
Resolution: --- → INCOMPLETE
Group: core-security → core-security-release
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.