Closed Bug 1625677 Opened 5 years ago Closed 5 years ago

Shutdown crash in [@ md_UnlockAndPostNotifies] (nss) via ldap_msgdelete (ldap60.dll) use-after-free

Categories

(Thunderbird :: General, defect)

Unspecified
Windows
defect
Not set
critical

Tracking

(thunderbird_esr68+ fixed, thunderbird75 wontfix, thunderbird76 wontfix, thunderbird77 fixed)

RESOLVED FIXED
Thunderbird 77.0
Tracking Status
thunderbird_esr68 + fixed
thunderbird75 --- wontfix
thunderbird76 --- wontfix
thunderbird77 --- fixed

People

(Reporter: wsmwk, Assigned: benc)

References

Details

(4 keywords)

Crash Data

Attachments

(2 files)

Ben or Mangus can you take a quick look at this?

It is topcrash for a few months but I missed filing a new bug (because of old useless, still open Firefox bug reports with the same signature).
#5 crash for betas and #3 crash for 68.6.0 (and presumably In small sample size of four 68.x or newer crashes, all involve ldap_msgdelete.

Signature exists prior to 68.3.0 but recent history [1] has a uptick in 68.3.0 and much bigger uptick in 68.3.1 indicating one or more regressions. Release channel history and rate changes are complicated in this period and not fully documented, so I think better to judge the regression history by looking at betas [2]. We see the first strong uptick in 71, tripled in 72 and 73, and then a strong decrease at 74 (perhaps because of some users giving up on ldap or wanting to stay on 73 for other reasons?)
75.0b0 79 13.8% 38
74.0b0 94 16.5% 38
73.0b0 187 32.7% 114
72.0b0 152 26.6% 90
71.0b0 56 9.8% 73
70.0b0 1 0.2% 1
60.0b0 2 0.4% 2

72 beta 2 shipped 2019-12-14 with Bug 1601389 - Crash in [@ SearchExtRunnable::~SearchExtRunnable] which could explain the uptick in 72 beta. It also went out in 68.3.1, shipped ~2019-12-19?

The first nightly crash is bp-2292c085-fe9a-4752-8f6a-6d75e0191118 2019-11-18 14:20:54 72.0a1 buildid 20191117080452 - but perhaps we must blame Bug 1576364 - LDAP/SSL broken which landed on 2019-11-07 (Bug 1601389 did not land on trunk until 2019-12-09)

I have no convenient explanation for the uptick in 71 beta.

Also noteworthy, in January I filed Bug 1610797 - Crash in [@ nsLDAPSSLClose]

This bug is for crash report bp-4c040b81-63c0-4eb2-af86-4d6960200325.

0 nss3.dll md_UnlockAndPostNotifies nsprpub/pr/src/md/windows/w95cv.c:86
1 nss3.dll PR_Unlock nsprpub/pr/src/threads/combined/prulock.c:332
2 prldap60.dll prldap_mutex_unlock comm/ldap/c-sdk/libraries/libprldap/ldappr-threads.c:209
3 ldap60.dll ldap_msgdelete comm/ldap/c-sdk/libraries/libldap/result.c
4 ldap60.dll do_abandon comm/ldap/c-sdk/libraries/libldap/abandon.c:160
5 ldap60.dll ldap_abandon_ext comm/ldap/c-sdk/libraries/libldap/abandon.c:98
6 xul.dll AbandonExtRunnable::Run comm/ldap/xpcom/src/nsLDAPOperation.cpp:614
7 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1220
8 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:481
9 xul.dll mozilla::net::nsSocketTransportService::Run netwerk/base/nsSocketTransportService2.cpp:1135

[1] 6 months graph for 68.3.0 68.3.1 68.2.2 68.2.1 https://crash-stats.mozilla.com/signature/?product=Thunderbird&version=60.3.0&version=68.3.0&version=68.3.1&version=68.2.2&version=68.2.1&signature=md_UnlockAndPostNotifies&date=%3E%3D2019-09-28T10%3A48%3A00.000Z&date=%3C2020-03-28T10%3A48%3A00.000Z#graphs

[2] betas https://crash-stats.mozilla.com/signature/?product=Thunderbird&release_channel=beta&signature=md_UnlockAndPostNotifies&date=%3E%3D2019-09-28T10%3A48%3A00.000Z&date=%3C2020-03-28T10%3A48%3A00.000Z#graphs

Flags: needinfo?(mkmelin+mozilla)
Flags: needinfo?(benc)
Crash Signature: [@ md_UnlockAndPostNotifies] → [@ md_UnlockAndPostNotifies ]

Mac signature is ldap_msgdelete bp-2b05857e-ddc8-4e0a-a4b4-cbbd00200328

Crash Signature: [@ md_UnlockAndPostNotifies ] → [@ md_UnlockAndPostNotifies ] [@ ldap_msgdelete ]

Hi,
I know, that our LDAP-Server is down at least for the last two days, since than my TB crashs with that signature:
Thunderbird 68.6.0 Crash Report [@ md_UnlockAndPostNotifies ]
bp-60e42589-a553-478e-8cc6-b818d0200403
bp-1ee39942-2ec8-476d-88af-f1f630200403
bp-dbbf6d32-d031-4cbb-bc0c-f0b090200402

perhaps that information can help.

Best regards
Robert

(In reply to Robert Hartmann from comment #3)

Hi,
I know, that our LDAP-Server is down at least for the last two days, since than my TB crashs with that signature:
Thunderbird 68.6.0 Crash Report [@ md_UnlockAndPostNotifies ]
bp-60e42589-a553-478e-8cc6-b818d0200403
bp-1ee39942-2ec8-476d-88af-f1f630200403
bp-dbbf6d32-d031-4cbb-bc0c-f0b090200402

perhaps that information can help.

additionaly I tried with beta and nightly

Release: 68.6.0 (32-Bit) Thunderbird 68.6.0 Crash Report [@ md_UnlockAndPostNotifies ]
bp-a9d3d9e1-e34e-4bb2-a947-cd7450200403

Beta: 75.0b3 (64-Bit) Thunderbird 75.0 Crash Report [@ md_UnlockAndPostNotifies ]
bp-0da51773-34ea-44a3-932e-57c680200403

Nightly: 76.0a1 (2020-04-03) (64-bit) Thunderbird 76.0a1 Crash Report [@ mozilla::PresShell::DoFlushPendingNotifications ]
bp-2b2102c5-8e07-466d-9413-447920200404

Best regards
Robert

The nightly one looks different, likely bug 1530177.
If we can reproduce this with pulling the ldap server offline, this should be easily reproducible and then also fixable.

Assignee: nobody → benc
Summary: Crash in [@ md_UnlockAndPostNotifies] (nss) via ldap_msgdelete (ldap60.dll) → Crash in [@ md_UnlockAndPostNotifies] (nss) via ldap_msgdelete (ldap60.dll) use-after-free

I think the problem is that a failed LDAP operation is left on the pending queue, and so upon shutdown, it tries to issue an 'abandon' operation to cancel it. And dies.

This patch removes operations from the pending queue if they fail.
It also tweaks the LDAP MOZ_LOG output.

Steps to reproduce the crash:
In preferences, set up a dud LDAP server (I just entered a host which has no LDAP running).
Open the addressbook and issue a lookup on that server.
Close Thunderbird.
It crashes without this patch.

Flags: needinfo?(benc)
Attachment #9144928 - Flags: review?(mkmelin+mozilla)

Not a big deal, but did the problem exist prior to version 68? (I currently have the bug marked as a regression)

Summary: Crash in [@ md_UnlockAndPostNotifies] (nss) via ldap_msgdelete (ldap60.dll) use-after-free → Shutdown crash in [@ md_UnlockAndPostNotifies] (nss) via ldap_msgdelete (ldap60.dll) use-after-free
Comment on attachment 9144928 [details] [diff] [review] 1625677-remove-failed-ops-1.patch Review of attachment 9144928 [details] [diff] [review]: ----------------------------------------------------------------- LGTM, r=mkmelin
Attachment #9144928 - Flags: review?(mkmelin+mozilla) → review+

Pushed by mkmelin@iki.fi:
https://hg.mozilla.org/comm-central/rev/e2f7a8209767
Remove failed LDAP operations from pending list (or we'd crash). r=mkmelin

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 77.0
Attachment #9144928 - Flags: approval-comm-esr68?

(In reply to Wayne Mery (:wsmwk) from comment #7)

Not a big deal, but did the problem exist prior to version 68? (I currently have the bug marked as a regression)

Without real evidence, I can tell, I have this bug since the update from 60 -> 68. The occurrence vary, but it seemed to be much more often in current 68.8.0 . The really annoying thing about it, it seems to create a life-lock and keeps one thread on my notebook spinning, which let the fan run all time.

I'm totally novice in thunderbird development ... when does this patch go into Thunderbird 68, or if it won't, when will it be in the beta channel? Or any version I can use without worries as daily driver?

Flags: needinfo?(mkmelin+mozilla)

(In reply to Hadrian2002 from comment #10)

(In reply to Wayne Mery (:wsmwk) from comment #7)

Not a big deal, but did the problem exist prior to version 68? (I currently have the bug marked as a regression)

Without real evidence, I can tell, I have this bug since the update from 60 -> 68. The occurrence vary, but it seemed to be much more often in current 68.8.0 . The really annoying thing about it, it seems to create a life-lock and keeps one thread on my notebook spinning, which let the fan run all time.

I'm totally novice in thunderbird development ... when does this patch go into Thunderbird 68, or if it won't, when will it be in the beta channel? Or any version I can use without worries as daily driver?

btw: my oldest online submitted bug-report is this: https://crash-stats.mozilla.org/report/index/3684243d-2bba-404b-ba6b-e3a770191209
so it seems not to be directly 68.0, but 68.3.x which is affected.

Likely in be released in 68.9 on June 2.
If you want to try a beta, that will also work (find it at the bottom of https://www.thunderbird.net/)

Flags: needinfo?(mkmelin+mozilla)

(In reply to Magnus Melin [:mkmelin] from comment #12)

Likely in be released in 68.9 on June 2.

This is in jeopardy.

I was preparing to suggest we take this as a point release because the crash rate is so high (#4 for 68.8.0) but I'm backing down from that idea because we still have a strong crash presence in beta and nightly
md_UnlockAndPostNotifies examples...
bp-d57f4ecf-cb3a-47bc-8852-fc0470200515 78.0a1 Crash Address 0xffffffffffffffff
bp-14468801-617d-46f5-9e4c-496af0200516 77.0 Crash Address 0xe5e5e5e5

crash rate might have gone down slightly 77 beta - too early to say
https://crash-stats.mozilla.org/signature/?release_channel=%21release&product=Thunderbird&signature=md_UnlockAndPostNotifies&date=%3E%3D2020-02-16T15%3A10%3A00.000Z&date=%3C2020-05-16T15%3A10%3A00.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_columns=startup_crash&_sort=-date&page=1#graphs

Robert, is your crash gone with 77 beta?

Flags: needinfo?(benc)
Flags: needinfo?(Robert_Hartmann)

I think this should still be a safe patch. It would only ever do something for error cases...

(In reply to Wayne Mery (:wsmwk) from comment #13)

(In reply to Magnus Melin [:mkmelin] from comment #12)

Likely in be released in 68.9 on June 2.

Robert, is your crash gone with 77 beta?

Hi Wayne ,
I installed TB 77.0b3, Build-ID 20200518172851 as you wished I try to test the LDAP.
Currently our LDAP is working (and I am not the responsible server admin), so I tried to enter
an invalide server adress in TB LDAP adressbook configuration: That is not possible in TB beta, a valide error information occures.

After that I used the correct LDAP adress, but running Adressbook lookup from Thunderbird while Windows has no network connection (no wire and no WLAN). The onlyinformation I got was, that no results were found.

So may be the leftover LDAP operation problem has gone.

Best regards
Robert

Flags: needinfo?(Robert_Hartmann)

Not realy gone ...
LDAP search against a running server which is not speaking LDAP brings Thunderbird a crash at exit.

Version 77.0b3
Build-ID 20200518172851
bp-43047602-1d28-4023-9ad6-3cf760200523

Thunderbird 77.0 Crash Report [@ md_UnlockAndPostNotifies ]

Crashing Thread (5), Name: Socket Thread
Frame Module Signature Source Trust
0 nss3.dll md_UnlockAndPostNotifies(_MDLock*, PRThread*, _MDCVar*) nsprpub/pr/src/md/windows/w95cv.c:86 context
1 nss3.dll PR_Unlock(PRLock*) nsprpub/pr/src/threads/combined/prulock.c:332 cfi
2 prldap60.dll prldap_mutex_unlock(void*) comm/ldap/c-sdk/libraries/libprldap/ldappr-threads.c:209 cfi
3 ldap60.dll ldap_msgdelete(ldap*, int) comm/ldap/c-sdk/libraries/libldap/result.c:0 cfi
4 ldap60.dll do_abandon(ldap*, int, int, ldapcontrol**, ldapcontrol**) comm/ldap/c-sdk/libraries/libldap/abandon.c:160 cfi
5 ldap60.dll ldap_abandon_ext(ldap*, int, ldapcontrol**, ldapcontrol**) comm/ldap/c-sdk/libraries/libldap/abandon.c:98 cfi
6 xul.dll AbandonExtRunnable::Run() comm/ldap/xpcom/src/nsLDAPOperation.cpp:614 cfi
7 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1200 cfi
8 xul.dll NS_ProcessNextEvent(nsIThread*, bool) xpcom/threads/nsThreadUtils.cpp:481 cfi
9 xul.dll mozilla::net::nsSocketTransportService::Run() netwerk/base/nsSocketTransportService2.cpp:1130 cfi
10 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1200 cfi
11 xul.dll NS_ProcessNextEvent(nsIThread*, bool) xpcom/threads/nsThreadUtils.cpp:481 cfi
12 xul.dll mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp:332 cfi
13 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc:308 cfi
14 xul.dll MessageLoop::Run() ipc/chromium/src/base/message_loop.cc:290 cfi
15 xul.dll static nsThread::ThreadFunc(void*) xpcom/threads/nsThread.cpp:444 cfi
16 nss3.dll PR_NativeRunThread(void*) nsprpub/pr/src/threads/combined/pruthr.c:399 cfi
17 nss3.dll pr_root(void*) nsprpub/pr/src/md/windows/w95thred.c:139 cfi
18 ucrtbase.dll thread_start<unsigned int (__cdecl*)(void*), 1> cfi
19 kernel32.dll BaseThreadInitThunk cfi
20 mozglue.dll patched_BaseThreadInitThunk(int, void*, void*) mozglue/dllservices/WindowsDllBlocklist.cpp:592 cfi
21 ntdll.dll RtlUserThreadStart cfi

Now I tried it with nightly (enter in LDAP configuration an address of a running server which don't talk LDAP and than doing a LDAP search)

Version 78.0a1
Build ID 20200521105808

bp-1c819dcb-adca-4233-930a-4f9190200523
Thunderbird 78.0a1 Crash Report [@ md_UnlockAndPostNotifies ]

Crashing Thread (5), Name: Socket Thread
Frame Module Signature Source Trust
0 nss3.dll md_UnlockAndPostNotifies(_MDLock*, PRThread*, _MDCVar*) nsprpub/pr/src/md/windows/w95cv.c:86 context
1 nss3.dll PR_Unlock(PRLock*) nsprpub/pr/src/threads/combined/prulock.c:332 cfi
2 prldap60.dll prldap_mutex_unlock(void*) comm/ldap/c-sdk/libraries/libprldap/ldappr-threads.c:209 cfi
3 ldap60.dll ldap_msgdelete(ldap*, int) comm/ldap/c-sdk/libraries/libldap/result.c:0 cfi
4 ldap60.dll do_abandon(ldap*, int, int, ldapcontrol**, ldapcontrol**) comm/ldap/c-sdk/libraries/libldap/abandon.c:160 cfi
5 ldap60.dll ldap_abandon_ext(ldap*, int, ldapcontrol**, ldapcontrol**) comm/ldap/c-sdk/libraries/libldap/abandon.c:98 cfi
6 xul.dll AbandonExtRunnable::Run() comm/ldap/xpcom/src/nsLDAPOperation.cpp:614 cfi
7 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1211 cfi
8 xul.dll NS_ProcessNextEvent(nsIThread*, bool) xpcom/threads/nsThreadUtils.cpp:501 cfi
9 xul.dll mozilla::net::nsSocketTransportService::Run() netwerk/base/nsSocketTransportService2.cpp:1134 cfi
10 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1211 cfi
11 xul.dll NS_ProcessNextEvent(nsIThread*, bool) xpcom/threads/nsThreadUtils.cpp:501 cfi
12 xul.dll mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp:302 cfi
13 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc:308 cfi
14 xul.dll MessageLoop::Run() ipc/chromium/src/base/message_loop.cc:290 cfi
15 xul.dll static nsThread::ThreadFunc(void*) xpcom/threads/nsThread.cpp:444 cfi
16 nss3.dll PR_NativeRunThread(void*) nsprpub/pr/src/threads/combined/pruthr.c:399 cfi
17 nss3.dll pr_root(void*) nsprpub/pr/src/md/windows/w95thred.c:139 cfi
18 ucrtbase.dll thread_start<unsigned int (__cdecl*)(void*), 1> cfi
19 kernel32.dll BaseThreadInitThunk cfi
20 mozglue.dll patched_BaseThreadInitThunk(int, void*, void*) mozglue/dllservices/WindowsDllBlocklist.cpp:592 cfi
21 ntdll.dll RtlUserThreadStart cfi

Same Test in 32bit TB release
Version 68.8.1
Build-ID 20200521175255

bp-87068895-94f9-4ca1-92f0-eb9bd0200523

Thunderbird 68.8.1 Crash Report [@ md_UnlockAndPostNotifies ]

Crashing Thread (8), Name: Socket Thread
Frame Module Signature Source Trust
0 nss3.dll static void md_UnlockAndPostNotifies(struct _MDLock*, struct PRThread*, struct _MDCVar*) nsprpub/pr/src/md/windows/w95cv.c:86 context
1 nss3.dll _PR_MD_UNLOCK nsprpub/pr/src/md/windows/w95cv.c:363 cfi
2 nss3.dll PR_Unlock nsprpub/pr/src/threads/combined/prulock.c:328 cfi
3 prldap60.dll static int prldap_mutex_unlock(void*) comm/ldap/c-sdk/libraries/libprldap/ldappr-threads.c:209 cfi
4 ldap60.dll ldap_msgdelete comm/ldap/c-sdk/libraries/libldap/result.c:1205 cfi
5 ldap60.dll static int do_abandon(struct ldap*, int, int, struct ldapcontrol**, struct ldapcontrol**) comm/ldap/c-sdk/libraries/libldap/abandon.c:160 cfi
6 ldap60.dll ldap_abandon_ext comm/ldap/c-sdk/libraries/libldap/abandon.c:98 cfi
7 xul.dll AbandonExtRunnable::Run() comm/ldap/xpcom/src/nsLDAPOperation.cpp:614 cfi
8 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1175 cfi
9 xul.dll NS_ProcessNextEvent(nsIThread*, bool) xpcom/threads/nsThreadUtils.cpp:486 cfi
10 xul.dll mozilla::net::nsSocketTransportService::Run() netwerk/base/nsSocketTransportService2.cpp:1013 cfi
11 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1175 cfi
12 xul.dll NS_ProcessNextEvent(nsIThread*, bool) xpcom/threads/nsThreadUtils.cpp:486 cfi
13 xul.dll mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp:303 cfi
14 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc:308 cfi
15 xul.dll MessageLoop::Run() ipc/chromium/src/base/message_loop.cc:290 cfi
16 xul.dll nsThread::ThreadFunc(void*) xpcom/threads/nsThread.cpp:454 cfi
17 nss3.dll _PR_NativeRunThread nsprpub/pr/src/threads/combined/pruthr.c:397 cfi
18 nss3.dll static unsigned int pr_root(void*) nsprpub/pr/src/md/windows/w95thred.c:137 cfi
19 ucrtbase.dll thread_start<unsigned int (__stdcall*)(void*), 1> cfi
20 kernel32.dll BaseThreadInitThunk cfi
21 mozglue.dll static void patched_BaseThreadInitThunk(int, void*, void*) mozglue/build/WindowsDllBlocklist.cpp:625 cfi
22 ntdll.dll _RtlUserThreadStart cfi
23 ntdll.dll _RtlUserThreadStart cfi

Comment on attachment 9144928 [details] [diff] [review] 1625677-remove-failed-ops-1.patch [Triage Comment] Approved for ESR
Attachment #9144928 - Flags: approval-comm-esr68? → approval-comm-esr68+

Unfortunately, the update to 68.9.0 did not fix my problem, keep resulting into a similar stack trace

https://crash-stats.mozilla.org/report/index/b0c56082-0939-4607-8d0f-7a04b0200611

There is a persisting use-after-free problem I guess. Currently, I cannot switch to beta or nightly, because I need extensions, which are not available yet.

Flags: needinfo?(benc)

I'd say there was zero impact on md_UnlockAndPostNotifies https://crash-stats.mozilla.org/signature/?signature=md_UnlockAndPostNotifies&date=%3E%3D2020-06-05T15%3A51%3A00.000Z&date=%3C2020-12-05T15%3A51%3A00.000Z#graphs

But all of version 78 has a lower crash rate for ldap_msgdelete, so this patch may have had in impact on that signature.

The uplift to 68.9.0 may also have had in impact on bug 1617786's ldap_memcache_abandon signature [1]
68.3.1 4
68.4.1 2
68.5.0 5
68.6.0 6
68.8.0 6
68.8.1 33
68.9.0 2
68.10.0 5
68.11.0 2
78.3.1 1
78.4.0 1
[1] https://crash-stats.mozilla.org/signature/?signature=ldap_memcache_abandon&date=%3E%3D2020-06-05T16%3A22%3A00.000Z&date=%3C2020-12-05T16%3A22%3A00.000Z#summary

Blocks: 1680914
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: