Closed Bug 1441694 Opened 8 years ago Closed 6 months ago

startup Crash in res_nsearch_2 or dns_res_send if chat account logs in at startup

Categories

(Thunderbird :: Instant Messaging, defect)

Unspecified
macOS
defect

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: unicorn.consulting, Unassigned)

References

()

Details

(Keywords: crash, Whiteboard: [startupcrash][rare])

Crash Data

User Story

Bug filed from support request in URL.
This bug was filed from the Socorro interface and is report bp-29cb54eb-5ea6-4fad-a03e-7bc220180227. ============================================================= Top 10 frames of crashing thread: 0 libresolv.9.dylib res_nsearch_2 1 libresolv.9.dylib res_9_nsearch 2 XUL ffi_call_unix64 3 XUL ffi_call js/src/ctypes/libffi/src/x86/ffi64.c:535 4 XUL js::ctypes::FunctionType::Call js/src/ctypes/CTypes.cpp:7143 5 XUL js::InternalCallOrConstruct js/src/jscntxtinlines.h:239 6 XUL Interpret js/src/vm/Interpreter.cpp:510 7 XUL js::RunScript js/src/vm/Interpreter.cpp:405 8 XUL js::InternalCallOrConstruct js/src/vm/Interpreter.cpp:477 9 XUL <name omitted> js/src/vm/Interpreter.cpp:523 =============================================================
Patrick, perhaps this is a chat bug? https://support.mozilla.org/en-US/questions/1207124 Mac crashes when two chat accounts are set to auto-login (this is the user in comment 0) bp-a2a604df-cb43-4d31-ab25-208980180212 is probably the same user (has the same install time) "It only happens when I set two chat accounts to log-on at startup. When I disable either one it doesn't crash. Also, it only happens on Mac, not on PC. And it's happened for the last year or so." ONly 10 crashes per week for 52.6.0, so very low crash rate. As far as I can tell this is not a new crash. In the past two months there have been a few 58.0b2 crashes and one 58.0b3. None so far for 59.0b1. The stacks actually start with 35 XUL NS_ProcessNextEvent(nsIThread*, bool) xpcom/glue/nsThreadUtils.cpp:361 36 XUL mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp:368 37 XUL MessageLoop::Run() ipc/chromium/src/base/message_loop.cc:232 38 XUL nsThread::ThreadFunc(void*) xpcom/threads/nsThread.cpp:467 39 libnss3.dylib _pt_root nsprpub/pr/src/pthreads/ptthread.c:216 40 libsystem_pthread.dylib _pthread_body 41 libsystem_pthread.dylib _pthread_start 42 libsystem_pthread.dylib thread_start 43 libnss3.dylib libnss3.dylib@0x1b83df A couple users also crash with dns_res_send bp-5dee46e8-4d2c-493b-9fd9-1525a0171229 52.4.0 which has a similar stack so I suspect the same root cause 0 libresolv.9.dylib dns_res_send 1 libresolv.9.dylib res_9_nsend_2 2 libresolv.9.dylib res_nquery_soa_min 3 libresolv.9.dylib res_nquerydomain_2 4 libresolv.9.dylib res_nsearch_2 5 libresolv.9.dylib res_9_nsearch 6 XUL ffi_call_unix64 7 XUL ffi_call js/src/ctypes/libffi/src/x86/ffi64.c:535 8 XUL js::ctypes::FunctionType::Call js/src/ctypes/CTypes.cpp:7143 ... 40 XUL mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) 41 XUL MessageLoop::Run() 42 XUL nsThread::ThreadFunc(void*) 43 libnss3.dylib _pt_root 44 libsystem_pthread.dylib _pthread_body 45 libsystem_pthread.dylib _pthread_start 46 libsystem_pthread.dylib thread_start 47 libnss3.dylib libnss3.dylib@0x1b83df
Crash Signature: [@ res_nsearch_2] → [@ res_nsearch_2] [@ dns_res_send ]
Component: Account Manager → Security
Flags: needinfo?(clokep)
Component: Security → Instant Messaging
Summary: Crash in res_nsearch_2 → startup Crash in res_nsearch_2 if chat account logs in at startup
Whiteboard: [startupcrash]

Interesting. Looks to me like this might be from our DNS SRV code. (https://dxr.mozilla.org/comm-central/source/comm/chat/modules/DNS.jsm). I wonder if there's a condition that doesn't allow multiple requests in flight at once.

Philipp, do you know if this code was written to allow for multiple in-flight requests at once? Some of the lookup code gets a bit thick for me! Unfortunately the stack doesn't seem to go into JavaScript land.

Flags: needinfo?(clokep) → needinfo?(philipp)

Based on http://man7.org/linux/man-pages/man3/res_nsearch.3.html it seems res_search is not threadsafe, so this could be a result of it. Maybe we could use res_nsearch instead? I'm not sure it is always available. Or we need to queue up the requests.

Flags: needinfo?(philipp)

Hmm it seems res_nsearch is already being used under the hood, so my previous comment may be wrong. If this crash is reproducible I think we should rather check if all buffers are correctly allocated.

Hey Paul, you were looking at this code recently. Any idea what's happening here?

Flags: needinfo?(paul)

As I mentioned on IRC the other day, I haven't had a chance to look into this yet. Will add it to my list.

Flags: needinfo?(paul)

Is it really likely this is Mac-only?

Flags: needinfo?(clokep)

Windows uses a completely different implementation. It is possible that this bug also affects Linux, see https://dxr.mozilla.org/comm-central/rev/2a29ee0adb310b54a6a2df72034953fed8f2b043/comm/chat/modules/DNS.jsm#310-313

Flags: needinfo?(clokep)
Whiteboard: [startupcrash] → [startupcrash][rare]
Severity: critical → S3
Summary: startup Crash in res_nsearch_2 if chat account logs in at startup → startup Crash in res_nsearch_2 or dns_res_send if chat account logs in at startup

Matt, does this still happen when using a current version, and is it reproducible for you?

Both signatures still occur for version 91

Flags: needinfo?(unicorn.consulting)

I have deleted all my chat accounts from Thunderbird, so I really can not assist with this any longer Wayne.

To resolve my issues with account ordering and other messed up layout decisions along with internet connectivity issues, I am minimising my Thunderbird accounts to only the essentials.

Flags: needinfo?(unicorn.consulting)

Both signatures occur only for older versions of Mac, so I'm going to declare this issue as resolved

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WORKSFORME

res_nsearch_2 has returned in 143.0b1

Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---

(In reply to Corey Bryant from comment #12)

res_nsearch_2 has returned in 143.0b1

Returned in 142.0 with a vengeance, which strongly suggests a regression. In such cases, best to file a new bug.

And in general, even if it's not a clear regression, if there has been so much passage of time and no connection demonstrated in the stack or step to reproduce (this bug is cited as being chat specific), also better to file a new bug.

Other points:

  • the stacks all have a lot of js so the trigger/cause may be totally unrelated to the signatures res_nsearch_2 and dns_res_send
  • significant, we don't see crash surges for any other OS (something I normally look for)
  • all versions of macOS are represented
Status: REOPENED → RESOLVED
Closed: 4 years ago6 months ago
Resolution: --- → WORKSFORME

The new crashes in 142 would be from bug 1976254.

You need to log in before you can comment on or make changes to this bug.