Closed Bug 1441694 Opened 7 years ago Closed 3 years ago

startup Crash in res_nsearch_2 or dns_res_send if chat account logs in at startup

Categories

(Thunderbird :: Instant Messaging, defect)

Unspecified
macOS
defect

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: unicorn.consulting, Unassigned)

References

()

Details

(Keywords: crash, Whiteboard: [startupcrash][rare])

Crash Data

User Story

Bug filed from support request in URL.
This bug was filed from the Socorro interface and is
report bp-29cb54eb-5ea6-4fad-a03e-7bc220180227.
=============================================================

Top 10 frames of crashing thread:

0 libresolv.9.dylib res_nsearch_2 
1 libresolv.9.dylib res_9_nsearch 
2 XUL ffi_call_unix64 
3 XUL ffi_call js/src/ctypes/libffi/src/x86/ffi64.c:535
4 XUL js::ctypes::FunctionType::Call js/src/ctypes/CTypes.cpp:7143
5 XUL js::InternalCallOrConstruct js/src/jscntxtinlines.h:239
6 XUL Interpret js/src/vm/Interpreter.cpp:510
7 XUL js::RunScript js/src/vm/Interpreter.cpp:405
8 XUL js::InternalCallOrConstruct js/src/vm/Interpreter.cpp:477
9 XUL <name omitted> js/src/vm/Interpreter.cpp:523

=============================================================
Patrick, perhaps this is a chat bug?

https://support.mozilla.org/en-US/questions/1207124 Mac crashes when two chat accounts are set to auto-login (this is the user in comment 0)

bp-a2a604df-cb43-4d31-ab25-208980180212	is probably the same user (has the same install time) "It only happens when I set two chat accounts to log-on at startup. When I disable either one it doesn't crash. Also, it only happens on Mac, not on PC. And it's happened for the last year or so."

ONly 10 crashes per week for 52.6.0, so very low crash rate. As far as I can tell this is not a new crash. In the past two months there have been a few 58.0b2 crashes and one 58.0b3. None so far for 59.0b1.

The stacks actually start with 
35	XUL	NS_ProcessNextEvent(nsIThread*, bool)	xpcom/glue/nsThreadUtils.cpp:361
36	XUL	mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*)	ipc/glue/MessagePump.cpp:368
37	XUL	MessageLoop::Run()	ipc/chromium/src/base/message_loop.cc:232
38	XUL	nsThread::ThreadFunc(void*)	xpcom/threads/nsThread.cpp:467
39	libnss3.dylib	_pt_root	nsprpub/pr/src/pthreads/ptthread.c:216
40	libsystem_pthread.dylib	_pthread_body	
41	libsystem_pthread.dylib	_pthread_start	
42	libsystem_pthread.dylib	thread_start	
43	libnss3.dylib	libnss3.dylib@0x1b83df


A couple users also crash with  dns_res_send bp-5dee46e8-4d2c-493b-9fd9-1525a0171229 52.4.0 which has a similar stack so I suspect the same root cause
 0 	libresolv.9.dylib	dns_res_send	
1 	libresolv.9.dylib	res_9_nsend_2	
2 	libresolv.9.dylib	res_nquery_soa_min	
3 	libresolv.9.dylib	res_nquerydomain_2	
4 	libresolv.9.dylib	res_nsearch_2	
5 	libresolv.9.dylib	res_9_nsearch	
6 	XUL	ffi_call_unix64	
7 	XUL	ffi_call	js/src/ctypes/libffi/src/x86/ffi64.c:535
8 	XUL	js::ctypes::FunctionType::Call	js/src/ctypes/CTypes.cpp:7143 
...
 40 	XUL	mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*)
41 	XUL	MessageLoop::Run()
42 	XUL	nsThread::ThreadFunc(void*)
43 	libnss3.dylib	_pt_root
44 	libsystem_pthread.dylib	_pthread_body
45 	libsystem_pthread.dylib	_pthread_start
46 	libsystem_pthread.dylib	thread_start
47 	libnss3.dylib	libnss3.dylib@0x1b83df
Crash Signature: [@ res_nsearch_2] → [@ res_nsearch_2] [@ dns_res_send ]
Component: Account Manager → Security
Flags: needinfo?(clokep)
Component: Security → Instant Messaging
Summary: Crash in res_nsearch_2 → startup Crash in res_nsearch_2 if chat account logs in at startup
Whiteboard: [startupcrash]

Interesting. Looks to me like this might be from our DNS SRV code. (https://dxr.mozilla.org/comm-central/source/comm/chat/modules/DNS.jsm). I wonder if there's a condition that doesn't allow multiple requests in flight at once.

Philipp, do you know if this code was written to allow for multiple in-flight requests at once? Some of the lookup code gets a bit thick for me! Unfortunately the stack doesn't seem to go into JavaScript land.

Flags: needinfo?(clokep) → needinfo?(philipp)

Based on http://man7.org/linux/man-pages/man3/res_nsearch.3.html it seems res_search is not threadsafe, so this could be a result of it. Maybe we could use res_nsearch instead? I'm not sure it is always available. Or we need to queue up the requests.

Flags: needinfo?(philipp)

Hmm it seems res_nsearch is already being used under the hood, so my previous comment may be wrong. If this crash is reproducible I think we should rather check if all buffers are correctly allocated.

Hey Paul, you were looking at this code recently. Any idea what's happening here?

Flags: needinfo?(paul)

As I mentioned on IRC the other day, I haven't had a chance to look into this yet. Will add it to my list.

Flags: needinfo?(paul)

Is it really likely this is Mac-only?

Flags: needinfo?(clokep)

Windows uses a completely different implementation. It is possible that this bug also affects Linux, see https://dxr.mozilla.org/comm-central/rev/2a29ee0adb310b54a6a2df72034953fed8f2b043/comm/chat/modules/DNS.jsm#310-313

Flags: needinfo?(clokep)
Whiteboard: [startupcrash] → [startupcrash][rare]
Severity: critical → S3
Summary: startup Crash in res_nsearch_2 if chat account logs in at startup → startup Crash in res_nsearch_2 or dns_res_send if chat account logs in at startup

Matt, does this still happen when using a current version, and is it reproducible for you?

Both signatures still occur for version 91

Flags: needinfo?(unicorn.consulting)

I have deleted all my chat accounts from Thunderbird, so I really can not assist with this any longer Wayne.

To resolve my issues with account ordering and other messed up layout decisions along with internet connectivity issues, I am minimising my Thunderbird accounts to only the essentials.

Flags: needinfo?(unicorn.consulting)

Both signatures occur only for older versions of Mac, so I'm going to declare this issue as resolved

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.