Closed Bug 1556599 Opened 6 years ago Closed 4 years ago

crash at null in [@ getenv]

Categories

(Core :: IPC, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
firefox-esr78 --- wontfix
firefox69 --- wontfix
firefox84 --- fixed
firefox85 --- fixed
firefox86 --- fixed

People

(Reporter: tsmith, Assigned: jld)

References

(Blocks 1 open bug)

Details

(Keywords: crash, intermittent-failure, Whiteboard: [stockwell unknown])

Attachments

(3 obsolete files)

The fuzzers recently started reporting this (startup?) crash.

First reported m-c 20190531-b2d39cc7a2db

==31199==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7fea96d8b785 bp 0x616000000c80 sp 0x7fea62e5f700 T6)
==31199==The signal is caused by a READ memory access.
==31199==Hint: address points to the zero page.
    #0 0x7fea96d8b784 in getenv /build/glibc-OTsEL5/glibc-2.27/stdlib/getenv.c:75
    #1 0x7fea96d74df8 in __gconv_load_cache /build/glibc-OTsEL5/glibc-2.27/iconv/gconv_cache.c:56
    #2 0x7fea96d6d724 in __gconv_read_conf /build/glibc-OTsEL5/glibc-2.27/iconv/gconv_conf.c:543
    #3 0x7fea97e94826 in __pthread_once_slow (/lib/x86_64-linux-gnu/libpthread.so.0+0xf826)
    #4 0x7fea96d6ca9e in __gconv_find_transform /build/glibc-OTsEL5/glibc-2.27/iconv/gconv_db.c:732
    #5 0x7fea96e14f8a in __wcsmbs_getfct /build/glibc-OTsEL5/glibc-2.27/wcsmbs/wcsmbsload.c:91
    #6 0x7fea96e14f8a in __wcsmbs_load_conv /build/glibc-OTsEL5/glibc-2.27/wcsmbs/wcsmbsload.c:185
    #7 0x7fea96e151bb in get_gconv_fcts /build/glibc-OTsEL5/glibc-2.27/wcsmbs/./wcsmbsload.h:74
    #8 0x7fea96e151bb in __wcsmbs_clone_conv /build/glibc-OTsEL5/glibc-2.27/wcsmbs/wcsmbsload.c:221
    #9 0x7fea96dd0017 in _IO_fwide /build/glibc-OTsEL5/glibc-2.27/libio/iofwide.c:118
    #10 0x7fea96dcbe80 in vswprintf /build/glibc-OTsEL5/glibc-2.27/libio/vswprintf.c:108
    #11 0x7fea73caae6c in vswprintf /src/ipc/chromium/src/base/string_util_posix.h:39:10
    #12 0x7fea73caae6c in vsnprintfT /src/ipc/chromium/src/base/string_util.cc:341
    #13 0x7fea73caae6c in StringAppendVT<std::__cxx11::basic_string<wchar_t> > /src/ipc/chromium/src/base/string_util.cc:361
    #14 0x7fea73caae6c in StringAppendV(std::__cxx11::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> >*, wchar_t const*, __va_list_tag*) /src/ipc/chromium/src/base/string_util.cc:534
    #15 0x7fea73caac55 in StringPrintf[abi:cxx11](wchar_t const*, ...) /src/ipc/chromium/src/base/string_util.cc:550:3
    #16 0x7fea73cbf3fe in IPC::Channel::GenerateUniqueRandomChannelID[abi:cxx11]() /src/ipc/chromium/src/chrome/common/ipc_channel.cc:36:10
    #17 0x7fea73cccd59 in IPC::Channel::GenerateVerifiedChannelID(std::__cxx11::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> > const&) /src/ipc/chromium/src/chrome/common/ipc_channel_posix.cc:978:20
    #18 0x7fea73cb6e5b in ChildProcessHost::CreateChannel() /src/ipc/chromium/src/chrome/common/child_process_host.cc:29:17
    #19 0x7fea73d33209 in mozilla::ipc::GeckoChildProcessHost::InitializeChannel() /src/ipc/glue/GeckoChildProcessHost.cpp:445:3
    #20 0x7fea73d31f1a in mozilla::ipc::GeckoChildProcessHost::RunPerformAsyncLaunch(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >) /src/ipc/glue/GeckoChildProcessHost.cpp:597:3
    #21 0x7fea73d5a434 in applyImpl<mozilla::ipc::GeckoChildProcessHost, void (mozilla::ipc::GeckoChildProcessHost::*)(std::vector<std::__cxx11::basic_string<char>, std::allocator<std::__cxx11::basic_string<char> > >), StoreCopyPassByConstLRef<std::vector<std::__cxx11::basic_string<char>, std::allocator<std::__cxx11::basic_string<char> > > > , 0> /src/obj-firefox/dist/include/nsThreadUtils.h:1122:12
    #22 0x7fea73d5a434 in apply<mozilla::ipc::GeckoChildProcessHost, void (mozilla::ipc::GeckoChildProcessHost::*)(std::vector<std::__cxx11::basic_string<char>, std::allocator<std::__cxx11::basic_string<char> > >)> /src/obj-firefox/dist/include/nsThreadUtils.h:1128
    #23 0x7fea73d5a434 in mozilla::detail::RunnableMethodImpl<mozilla::ipc::GeckoChildProcessHost*, void (mozilla::ipc::GeckoChildProcessHost::*)(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >), false, (mozilla::RunnableKind)0, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >::Run() /src/obj-firefox/dist/include/nsThreadUtils.h:1174
    #24 0x7fea73c694ef in RunTask /src/ipc/chromium/src/base/message_loop.cc:442:9
    #25 0x7fea73c694ef in MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask&&) /src/ipc/chromium/src/base/message_loop.cc:450
    #26 0x7fea73c6a9eb in MessageLoop::DoWork() /src/ipc/chromium/src/base/message_loop.cc:523:13
    #27 0x7fea73c6f11c in base::MessagePumpLibevent::Run(base::MessagePump::Delegate*) /src/ipc/chromium/src/base/message_pump_libevent.cc:321:31
    #28 0x7fea73c6791e in RunInternal /src/ipc/chromium/src/base/message_loop.cc:315:10
    #29 0x7fea73c6791e in RunHandler /src/ipc/chromium/src/base/message_loop.cc:308
    #30 0x7fea73c6791e in MessageLoop::Run() /src/ipc/chromium/src/base/message_loop.cc:290
    #31 0x7fea73cb1de7 in base::Thread::ThreadMain() /src/ipc/chromium/src/base/thread.cc:192:16
    #32 0x7fea73c7f7d8 in ThreadFunc(void*) /src/ipc/chromium/src/base/platform_thread_posix.cc:40:13
    #33 0x7fea97e8c6da in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76da)
    #34 0x7fea96e6a88e in clone /build/glibc-OTsEL5/glibc-2.27/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95

MessagePumpLibevent means this is the IPC I/O thread; also, GeckoChildProcessHost::RunPerformAsyncLaunch is called only on that thread.

I suspect that this vswprintf → locale → getenv path is racing with another thread (the main thread?) calling setenv or PR_SetEnv. Note that NSPR has locking to protect PR_GetEnv from PR_SetEnv, but env var consumers within libc or other system libraries don't use that.

I'd have to check the glibc source to verify it, but the locale part of this might be a one-time initialization (note the __pthread_once_slow), so maybe we could work around this particular problem by forcing it to happen earlier in startup. (It might also be important that this is a wide-char printf, vs. using normal printf and transcoding; see also bug 1447674.) However, this suggests a larger class of possible bugs.

See Also: → 1447674

The priority flag is not set for this bug.
:jld, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(jld)
Whiteboard: sb?
Priority: -- → P5
Whiteboard: sb?

Tyson, how reproducible is this crash? Is this something you could pin down under rr? That might make it easy to see where the setenv/PR_SetEnv side of this failure was, which would make a fix much easier. (Because the current fix would be to get rid of all of them)

Flags: needinfo?(twsmith)

This issue was reported frequently up until it was last reported by m-c 20190703-5f0961efaa1c and the fuzzers have not seen it since. Is it worth spending time on this one? is it happening somewhere else?

Flags: needinfo?(twsmith)

(In reply to Tyson Smith [:tsmith] (PTO Jan 6 - 15) from comment #4)

This issue was reported frequently up until it was last reported by m-c 20190703-5f0961efaa1c and the fuzzers have not seen it since. Is it worth spending time on this one? is it happening somewhere else?

Bug 1557113 appears to be a duplicate of this, and it's been happening a few times per week on CI.

Flags: needinfo?(jld)

One thing that can be done here is to get rid of “channel IDs” on Unix. What they actually are is Windows named pipe paths, which is why they're wstrings; on Unix we use the more modern capability-passing model of socketpair and SCM_RIGHTS. We still generate wstring names in that case, because the theoretically OS-independent internal APIs require them, but they're never used for anything.

That won't fix the underlying problem of racing on the environment, but if nothing else uses *wprintf (and we try not to; see bug 1447674) then it might at least remove this particular instance of the problem, and I was already looking into cleaning this up in connection with bug 1607153.

The PipeMap class tries to simulate the Windows channel model (named
pipes that the client opens by a pathname) on Unix. However, it's
effectively dead code -- the map is empty except in some unit tests that
we never imported.

What we do is generate a "channel ID" with string formatting, then don't
pass it to the child or ever insert anything into the map, then the child
looks up an empty string and doesn't find it, so it uses the hard-coded
fixed fd for the initial channel.

Basically, it does nothing except maybe confuse unfamiliar readers, so
let's get rid of it.

Assignee: nobody → jld
Status: NEW → ASSIGNED

This "create a pipe" operation has a mode where, on Unix, it doesn't
create a new transport but rather uses a hard-coded fd for the initial
IPC channel in a child process. (It was originally written for Windows
and the assumption of using named pipes and pathnames for everything.)

That seems like a footgun, so this patch checks for trying to "create"
that pipe twice. However, it doesn't check for accidentally calling it
in the parent process.

The Chromium-derived IPC code was, as far as I can tell, originally
designed for Windows and assumed that channels would be named pipes,
managed and connected to via std::wstring paths. The port to Unix,
however, used unnamed socketpair() and passed them directly from
process to process, so it has no use for these channel IDs... but it
still computes and propagates them, even though they're not used, using
deprecated wide-string APIs.

This patch introduces a typedef for an abstract channel ID, which is a
wstring on Windows and an empty struct on Unix, to allow removing the
string code where it's not needed without needing to completely redesign
the channel abstraction.

Comment on attachment 9142899 [details]
Bug 1556599 - Part 1: Remove IPC PipeMap.

Revision D72258 was moved to bug 1632687. Setting attachment 9142899 [details] to obsolete.

Attachment #9142899 - Attachment is obsolete: true

Comment on attachment 9142900 [details]
Bug 1556599 - Part 1.5: Protect the hard-coded IPC child fd from accidental multiple use.

Revision D72259 was moved to bug 1632687. Setting attachment 9142900 [details] to obsolete.

Attachment #9142900 - Attachment is obsolete: true

Comment on attachment 9142901 [details]
Bug 1556599 - Part 2: Introduce an OS-dependent ChannelId type to reflect that Unix doesn't use channel IDs.

Revision D72260 was moved to bug 1632687. Setting attachment 9142901 [details] to obsolete.

Attachment #9142901 - Attachment is obsolete: true

(Sorry for the noise — I was originally going to land the patches on this bug, but I didn't like the idea of making a significant change under a bug that was filed for something that's like a convenient side-effect than the actual purpose of the patches, so I opened a new bug… and forgot to fix the commit messages.)

There are 32 total failures in the last 7 days on

  • linux1804-64-asan opt
  • linux1804-64-asan-qr opt

Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=302644750&repo=autoland&lineNumber=7924

[task 2020-05-17T14:29:18.013Z] 14:29:18 INFO - REFTEST TEST-START | layout/reftests/view-source/view-source-image.html == layout/reftests/view-source/view-source-image-ref.html
[task 2020-05-17T14:29:18.015Z] 14:29:18 INFO - REFTEST TEST-LOAD | file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/view-source/view-source-image.html | 0 / 1 (0%)
[task 2020-05-17T14:29:19.086Z] 14:29:19 INFO - REFTEST INFO | drawWindow flags = DRAWWINDOW_DRAW_CARET | DRAWWINDOW_DRAW_VIEW | DRAWWINDOW_USE_WIDGET_LAYERS; window size = 800,1000; test browser size = 800,1000
[task 2020-05-17T14:29:19.211Z] 14:29:19 INFO - REFTEST TEST-LOAD | file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/view-source/view-source-image-ref.html | 0 / 1 (0%)
[task 2020-05-17T14:29:19.431Z] 14:29:19 INFO - REFTEST TEST-KNOWN-FAIL | layout/reftests/view-source/view-source-image.html == layout/reftests/view-source/view-source-image-ref.html | image comparison, max difference: 255, number of differing pixels: 9078
[task 2020-05-17T14:29:19.435Z] 14:29:19 INFO - REFTEST TEST-END | layout/reftests/view-source/view-source-image.html == layout/reftests/view-source/view-source-image-ref.html
[task 2020-05-17T14:29:19.456Z] 14:29:19 INFO - REFTEST INFO | Slowest test took 384ms (file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/view-source/view-source-image.html)
[task 2020-05-17T14:29:19.457Z] 14:29:19 INFO - REFTEST INFO | Total canvas count = 2
[task 2020-05-17T14:29:20.626Z] 14:29:20 INFO - 1589725760622 Marionette TRACE Received observer notification xpcom-will-shutdown
[task 2020-05-17T14:29:20.626Z] 14:29:20 INFO - 1589725760622 Marionette INFO Stopped listening on port 2828
[task 2020-05-17T14:29:20.626Z] 14:29:20 INFO - 1589725760623 Marionette DEBUG Marionette stopped listening
[task 2020-05-17T14:29:20.969Z] 14:29:20 INFO - ###!!! [Child][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost
[task 2020-05-17T14:29:21.733Z] 14:29:21 INFO - REFTEST INFO | Process mode: e10s
[task 2020-05-17T14:29:21.733Z] 14:29:21 WARNING - leakcheck | refcount logging is off, so leaks can't be detected!
[task 2020-05-17T14:29:21.734Z] 14:29:21 INFO - REFTEST INFO | Running tests in file:///builds/worker/workspace/build/tests/reftest/tests/layout/reftests/w3c-css/submitted/multicol3/reftest.list
[task 2020-05-17T14:29:21.755Z] 14:29:21 INFO - REFTEST INFO | INFO | runtests.py | ASan using symbolizer at /builds/worker/workspace/build/application/firefox/llvm-symbolizer
[task 2020-05-17T14:29:21.763Z] 14:29:21 INFO - REFTEST INFO | INFO | runtests.py | ASan running in default memory configuration
[task 2020-05-17T14:29:21.763Z] 14:29:21 INFO - REFTEST INFO | UBSan enabled.
[task 2020-05-17T14:29:21.763Z] 14:29:21 INFO - REFTEST INFO | Running with e10s: True
[task 2020-05-17T14:29:21.763Z] 14:29:21 INFO - REFTEST INFO | Application command: /builds/worker/workspace/build/application/firefox/firefox -marionette -profile /tmp/tmpjRoMfS.mozrunner
[task 2020-05-17T14:29:22.992Z] 14:29:22 INFO - AddressSanitizer:DEADLYSIGNAL
[task 2020-05-17T14:29:22.993Z] 14:29:22 INFO - =================================================================
[task 2020-05-17T14:29:22.994Z] 14:29:22 ERROR - ==8010==ERROR: AddressSanitizer: SEGV on unknown address (pc 0x7f367e533785 bp 0x616000010b80 sp 0x7f3658e40700 T4)
[task 2020-05-17T14:29:22.994Z] 14:29:22 INFO - ==8010==The signal is caused by a READ memory access.
[task 2020-05-17T14:29:22.994Z] 14:29:22 INFO - ==8010==Hint: this fault was caused by a dereference of a high value address (see register values below). Dissassemble the provided pc to learn which register was used.
[task 2020-05-17T14:29:23.170Z] 14:29:23 INFO - #0 0x7f367e533785 in getenv /build/glibc-OTsEL5/glibc-2.27/stdlib/getenv.c:84
[task 2020-05-17T14:29:23.172Z] 14:29:23 INFO - #1 0x7f367e51cdf8 in __gconv_load_cache /build/glibc-OTsEL5/glibc-2.27/iconv/gconv_cache.c:56
[task 2020-05-17T14:29:23.173Z] 14:29:23 INFO - #2 0x7f367e515724 in __gconv_read_conf /build/glibc-OTsEL5/glibc-2.27/iconv/gconv_conf.c:543
[task 2020-05-17T14:29:23.195Z] 14:29:23 INFO - #3 0x7f367f63c826 in __pthread_once_slow /build/glibc-OTsEL5/glibc-2.27/nptl/pthread_once.c:116
[task 2020-05-17T14:29:23.195Z] 14:29:23 INFO - #4 0x7f367e514a9e in __gconv_find_transform /build/glibc-OTsEL5/glibc-2.27/iconv/gconv_db.c:732
[task 2020-05-17T14:29:23.195Z] 14:29:23 INFO - #5 0x7f367e5bcf8a in __wcsmbs_getfct /build/glibc-OTsEL5/glibc-2.27/wcsmbs/wcsmbsload.c:91
[task 2020-05-17T14:29:23.195Z] 14:29:23 INFO - #6 0x7f367e5bcf8a in __wcsmbs_load_conv /build/glibc-OTsEL5/glibc-2.27/wcsmbs/wcsmbsload.c:185
[task 2020-05-17T14:29:23.197Z] 14:29:23 INFO - #7 0x7f367e5bd1bb in get_gconv_fcts /build/glibc-OTsEL5/glibc-2.27/wcsmbs/./wcsmbsload.h:74
[task 2020-05-17T14:29:23.197Z] 14:29:23 INFO - #8 0x7f367e5bd1bb in __wcsmbs_clone_conv /build/glibc-OTsEL5/glibc-2.27/wcsmbs/wcsmbsload.c:221
[task 2020-05-17T14:29:23.197Z] 14:29:23 INFO - #9 0x7f367e578017 in _IO_fwide /build/glibc-OTsEL5/glibc-2.27/libio/iofwide.c:118
[task 2020-05-17T14:29:23.197Z] 14:29:23 INFO - #10 0x7f367e573e80 in vswprintf /build/glibc-OTsEL5/glibc-2.27/libio/vswprintf.c:108
[task 2020-05-17T14:29:23.574Z] 14:29:23 INFO - #11 0x7f3660176fdd in vswprintf /builds/worker/checkouts/gecko/ipc/chromium/src/base/string_util_posix.h:39:10
[task 2020-05-17T14:29:23.575Z] 14:29:23 INFO - #12 0x7f3660176fdd in vsnprintfT /builds/worker/checkouts/gecko/ipc/chromium/src/base/string_util.cc:341:10
[task 2020-05-17T14:29:23.576Z] 14:29:23 INFO - #13 0x7f3660176fdd in StringAppendVT<std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> > > /builds/worker/checkouts/gecko/ipc/chromium/src/base/string_util.cc:361:16
[task 2020-05-17T14:29:23.576Z] 14:29:23 INFO - #14 0x7f3660176fdd in StringAppendV(std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> >, wchar_t const, __va_list_tag*) /builds/worker/checkouts/gecko/ipc/chromium/src/base/string_util.cc:534:3
[task 2020-05-17T14:29:23.577Z] 14:29:23 INFO - #15 0x7f3660176de4 in StringPrintf(wchar_t const*, ...) /builds/worker/checkouts/gecko/ipc/chromium/src/base/string_util.cc:550:3
[task 2020-05-17T14:29:23.578Z] 14:29:23 INFO - #16 0x7f366017b12e in GenerateUniqueRandomChannelID /builds/worker/checkouts/gecko/ipc/chromium/src/chrome/common/ipc_channel.cc:36:10
[task 2020-05-17T14:29:23.578Z] 14:29:23 INFO - #17 0x7f366017b12e in IPC::Channel::GenerateVerifiedChannelID(std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> > const&) /builds/worker/checkouts/gecko/ipc/chromium/src/chrome/common/ipc_channel_posix.cc:978:20
[task 2020-05-17T14:29:23.584Z] 14:29:23 INFO - #18 0x7f366017ad01 in ChildProcessHost::CreateChannel() /builds/worker/checkouts/gecko/ipc/chromium/src/chrome/common/child_process_host.cc:29:17
[task 2020-05-17T14:29:23.599Z] 14:29:23 INFO - #19 0x7f36601d8dc5 in mozilla::ipc::GeckoChildProcessHost::InitializeChannel() /builds/worker/checkouts/gecko/ipc/glue/GeckoChildProcessHost.cpp:816:3
[task 2020-05-17T14:29:23.600Z] 14:29:23 INFO - #20 0x7f36601d8264 in mozilla::ipc::BaseProcessLauncher::Launch(mozilla::ipc::GeckoChildProcessHost*) /builds/worker/checkouts/gecko/ipc/glue/GeckoChildProcessHost.cpp:1821:10
[task 2020-05-17T14:29:23.600Z] 14:29:23 INFO - #21 0x7f3660203b7b in applyImpl<mozilla::ipc::BaseProcessLauncher, RefPtr<mozilla::MozPromise<mozilla::ipc::LaunchResults, mozilla::ipc::LaunchError, false> > (mozilla::ipc::BaseProcessLauncher::)(mozilla::ipc::GeckoChildProcessHost ), StorePtrPassByPtr<mozilla::ipc::GeckoChildProcessHost> , 0> /builds/worker/workspace/obj-build/dist/include/nsThreadUtils.h:1185:12
[task 2020-05-17T14:29:23.600Z] 14:29:23 INFO - #22 0x7f3660203b7b in apply<mozilla::ipc::BaseProcessLauncher, RefPtr<mozilla::MozPromise<mozilla::ipc::LaunchResults, mozilla::ipc::LaunchError, false> > (mozilla::ipc::BaseProcessLauncher::
)(mozilla::ipc::GeckoChildProcessHost )> /builds/worker/workspace/obj-build/dist/include/nsThreadUtils.h:1191:12
[task 2020-05-17T14:29:23.601Z] 14:29:23 INFO - #23 0x7f3660203b7b in mozilla::detail::MethodCall<mozilla::MozPromise<mozilla::ipc::LaunchResults, mozilla::ipc::LaunchError, false>, RefPtr<mozilla::MozPromise<mozilla::ipc::LaunchResults, mozilla::ipc::LaunchError, false> > (mozilla::ipc::BaseProcessLauncher::
)(mozilla::ipc::GeckoChildProcessHost
), mozilla::ipc::BaseProcessLauncher, mozilla::ipc::GeckoChildProcessHost*>::Invoke() /builds/worker/workspace/obj-build/dist/include/mozilla/MozPromise.h:1413:47
[task 2020-05-17T14:29:23.601Z] 14:29:23 INFO - #24 0x7f36602036dd in mozilla::detail::ProxyRunnable<mozilla::MozPromise<mozilla::ipc::LaunchResults, mozilla::ipc::LaunchError, false>, RefPtr<mozilla::MozPromise<mozilla::ipc::LaunchResults, mozilla::ipc::LaunchError, false> > (mozilla::ipc::BaseProcessLauncher::)(mozilla::ipc::GeckoChildProcessHost), mozilla::ipc::BaseProcessLauncher, mozilla::ipc::GeckoChildProcessHost*>::Run() /builds/worker/workspace/obj-build/dist/include/mozilla/MozPromise.h:1433:42
[task 2020-05-17T14:29:23.601Z] 14:29:23 INFO - #25 0x7f3660158adb in MessageLoop::RunTask(already_AddRefed<nsIRunnable>) /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:442:9
[task 2020-05-17T14:29:23.602Z] 14:29:23 INFO - #26 0x7f3660159938 in MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask&&) /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:450:5
[task 2020-05-17T14:29:23.602Z] 14:29:23 INFO - #27 0x7f366015a17b in MessageLoop::DoWork() /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:523:13
[task 2020-05-17T14:29:23.602Z] 14:29:23 INFO - #28 0x7f366015cb90 in base::MessagePumpLibevent::Run(base::MessagePump::Delegate*) /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_pump_libevent.cc:321:31
[task 2020-05-17T14:29:23.602Z] 14:29:23 INFO - #29 0x7f36601586c1 in RunInternal /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:315:10
[task 2020-05-17T14:29:23.603Z] 14:29:23 INFO - #30 0x7f36601586c1 in RunHandler /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:308:3
[task 2020-05-17T14:29:23.603Z] 14:29:23 INFO - #31 0x7f36601586c1 in MessageLoop::Run() /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:290:3
[task 2020-05-17T14:29:23.603Z] 14:29:23 INFO - #32 0x7f3660178a35 in base::Thread::ThreadMain() /builds/worker/checkouts/gecko/ipc/chromium/src/base/thread.cc:192:16
[task 2020-05-17T14:29:23.604Z] 14:29:23 INFO - #33 0x7f3660169e6c in ThreadFunc(void*) /builds/worker/checkouts/gecko/ipc/chromium/src/base/platform_thread_posix.cc:40:13
[task 2020-05-17T14:29:23.604Z] 14:29:23 INFO - #34 0x7f367f6346da in start_thread /build/glibc-OTsEL5/glibc-2.27/nptl/pthread_create.c:463
[task 2020-05-17T14:29:23.604Z] 14:29:23 INFO - #35 0x7f367e61288e in clone /build/glibc-OTsEL5/glibc-2.27/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95
[task 2020-05-17T14:29:23.604Z] 14:29:23 INFO - AddressSanitizer can not provide additional info.
[task 2020-05-17T14:29:23.605Z] 14:29:23 INFO - SUMMARY: AddressSanitizer: SEGV /build/glibc-OTsEL5/glibc-2.27/stdlib/getenv.c:84 in getenv
[task 2020-05-17T14:29:23.605Z] 14:29:23 INFO - Thread T4 (Gecko_IOThread) created by T0 here:
[task 2020-05-17T14:29:23.623Z] 14:29:23 INFO - #0 0x558ab243ba9a in pthread_create /builds/worker/fetches/llvm-project/llvm/projects/compiler-rt/lib/asan/asan_interceptors.cpp:214:3
[task 2020-05-17T14:29:23.624Z] 14:29:23 INFO - #1 0x7f366016539c in CreateThread /builds/worker/checkouts/gecko/ipc/chromium/src/base/platform_thread_posix.cc:123:14
[task 2020-05-17T14:29:23.625Z] 14:29:23 INFO - #2 0x7f366016539c in PlatformThread::Create(unsigned long, PlatformThread::Delegate*, unsigned long*) /builds/worker/checkouts/gecko/ipc/chromium/src/base/platform_thread_posix.cc:134:10
[task 2020-05-17T14:29:23.626Z] 14:29:23 INFO - #3 0x7f366017824d in base::Thread::StartWithOptions(base::Thread::Options const&) /builds/worker/checkouts/gecko/ipc/chromium/src/base/thread.cc:97:8
[task 2020-05-17T14:29:23.627Z] 14:29:23 INFO - #4 0x7f365f194ad1 in NS_InitXPCOM /builds/worker/checkouts/gecko/xpcom/build/XPCOMInit.cpp:317:9
[task 2020-05-17T14:29:23.628Z] 14:29:23 INFO - #5 0x7f3669d5c156 in ScopedXPCOMStartup::Initialize(bool) /builds/worker/checkouts/gecko/toolkit/xre/nsAppRunner.cpp:1311:8
[task 2020-05-17T14:29:23.628Z] 14:29:23 INFO - #6 0x7f3669d74a24 in XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) /builds/worker/checkouts/gecko/toolkit/xre/nsAppRunner.cpp:4749:22
[task 2020-05-17T14:29:23.630Z] 14:29:23 INFO - #7 0x7f3669d75363 in XRE_main(int, char**, mozilla::BootstrapConfig const&) /builds/worker/checkouts/gecko/toolkit/xre/nsAppRunner.cpp:4807:21
[task 2020-05-17T14:29:23.630Z] 14:29:23 INFO - #8 0x558ab2483ccb in do_main /builds/worker/checkouts/gecko/browser/app/nsBrowserApp.cpp:217:22
[task 2020-05-17T14:29:23.631Z] 14:29:23 INFO - #9 0x558ab2483ccb in main /builds/worker/checkouts/gecko/browser/app/nsBrowserApp.cpp:331:16
[task 2020-05-17T14:29:23.632Z] 14:29:23 INFO - #10 0x7f367e512b96 in __libc_start_main /build/glibc-OTsEL5/glibc-2.27/csu/../csu/libc-start.c:310
[task 2020-05-17T14:29:23.633Z] 14:29:23 INFO - ==8010==ABORTING
[task 2020-05-17T14:32:21.838Z] 14:32:21 ERROR - TEST-UNEXPECTED-FAIL | Last test finished | application terminated with exit code 1
[task 2020-05-17T14:32:21.840Z] 14:32:21 INFO - Traceback (most recent call last):
[task 2020-05-17T14:32:21.841Z] 14:32:21 INFO - File "/builds/worker/workspace/build/tests/reftest/runreftest.py", line 1038, in <module>
[task 2020-05-17T14:32:21.842Z] 14:32:21 INFO - sys.exit(run_test_harness(parser, options))
[task 2020-05-17T14:32:21.843Z] 14:32:21 INFO - File "/builds/worker/workspace/build/tests/reftest/runreftest.py", line 1030, in run_test_harness
[task 2020-05-17T14:32:21.844Z] 14:32:21 INFO - result = reftest.runTests(options.tests, options)
[task 2020-05-17T14:32:21.845Z] 14:32:21 INFO - File "/builds/worker/workspace/build/tests/reftest/runreftest.py", line 621, in runTests
[task 2020-05-17T14:32:21.846Z] 14:32:21 INFO - return self.runSerialTests(manifests, options, cmdargs)
[task 2020-05-17T14:32:21.847Z] 14:32:21 INFO - File "/builds/worker/workspace/build/tests/reftest/runreftest.py", line 978, in runSerialTests
[task 2020-05-17T14:32:21.848Z] 14:32:21 INFO - status = run(tests=tests)
[task 2020-05-17T14:32:21.849Z] 14:32:21 INFO - File "/builds/worker/workspace/build/tests/reftest/runreftest.py", line 952, in run
[task 2020-05-17T14:32:21.850Z] 14:32:21 INFO - **kwargs)
[task 2020-05-17T14:32:21.851Z] 14:32:21 INFO - File "/builds/worker/workspace/build/tests/reftest/runreftest.py", line 884, in runApp
[task 2020-05-17T14:32:21.852Z] 14:32:21 INFO - raise reraise(exc, value, tb)
[task 2020-05-17T14:32:21.853Z] 14:32:21 INFO - File "/builds/worker/workspace/build/tests/reftest/runreftest.py", line 847, in runApp
[task 2020-05-17T14:32:21.854Z] 14:32:21 INFO - marionette.start_session()
[task 2020-05-17T14:32:21.855Z] 14:32:21 INFO - File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/marionette_driver/decorators.py", line 36, in _
[task 2020-05-17T14:32:21.856Z] 14:32:21 INFO - m._handle_socket_failure()
[task 2020-05-17T14:32:21.857Z] 14:32:21 INFO - File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/marionette_driver/marionette.py", line 653, in _handle_socket_failure
[task 2020-05-17T14:32:21.858Z] 14:32:21 INFO - reraise(exc_cls, exc, tb)
[task 2020-05-17T14:32:21.859Z] 14:32:21 INFO - File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/marionette_driver/decorators.py", line 26, in _
[task 2020-05-17T14:32:21.860Z] 14:32:21 INFO - return func(*args, **kwargs)
[task 2020-05-17T14:32:21.861Z] 14:32:21 INFO - File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/marionette_driver/marionette.py", line 1111, in start_session
[task 2020-05-17T14:32:21.862Z] 14:32:21 INFO - self.raise_for_port(timeout=timeout)
[task 2020-05-17T14:32:21.863Z] 14:32:21 INFO - File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/marionette_driver/marionette.py", line 572, in raise_for_port
[task 2020-05-17T14:32:21.864Z] 14:32:21 INFO - self.host, self.port))
[task 2020-05-17T14:32:21.865Z] 14:32:21 INFO - socket.timeout: Timed out waiting for connection on 127.0.0.1:2828!
[task 2020-05-17T14:32:21.893Z] 14:32:21 ERROR - Return code: 1
[task 2020-05-17T14:32:21.901Z] 14:32:21 ERROR - No suite end message was emitted by this harness.
[task 2020-05-17T14:32:21.902Z] 14:32:21 INFO - TinderboxPrint: reftest-reftest<br/>2136/0/40
[task 2020-05-17T14:32:21.902Z] 14:32:21 ERROR - # TBPL FAILURE #
[task 2020-05-17T14:32:21.902Z] 14:32:21 WARNING - setting return code to 2

Jed, can you take a look?

Flags: needinfo?(jld)
Whiteboard: [stockwell needswork:owner]

Jed, can you take a look?

As already explained in the comments above, the hope is that https://bugzilla.mozilla.org/show_bug.cgi?id=1632687 will hide this problem.

Flags: needinfo?(jld)

This bug failed 55 times in the last 7 days. Occurs on linux1804-64-asan and linux1804-64-asan-qr on opt build types.

Recent log:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=308492653&repo=autoland&lineNumber=12738

Jed: Can you have someone take a look at this bug?

Flags: needinfo?(jld)

FWIW: The fuzzers have been hitting this during browser startup again. I see a few reports per day since at least the beginning of June.

(In reply to Tyson Smith [:tsmith] from comment #31)

FWIW: The fuzzers have been hitting this during browser startup again. I see a few reports per day since at least the beginning of June.

I take that back. The dates were from exactly one year ago (July 4 UTC)!

There are 27 failure in the past 7 days. Happens on linux1804-64-asan-qr and linux1804-64-asan opt builds.

Recent failure log:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=309392306&repo=autoland&lineNumber=6531

Bug 1632687 landed. If all goes well, we'll stop seeing this on m-c.

Flags: needinfo?(jld)

Appears to be fixed in 80 as expected, but I don't think the fix is suitable for uplift.

Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
See Also: → 1688989
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: