Open Bug 335303 Opened 19 years ago Updated 1 year ago

Null pointer dereference in pt_PostNotifyToCvar (pr/src/pthreads/ptsynch.c)

Categories

(NSPR :: NSPR, defect, P3)

Tracking

(Not tracked)

People

(Reporter: kherron+mozilla, Unassigned)

References

()

Details

(Keywords: coverity, crash)

Crash Data

Attachments

(1 file)

This was found through a coverity scan of the firefox source code. Please refer to the sample URL. At lines 323-325, |pt_PostNotifyToCvar| checks |notified->link| for NULL, calls |PR_NEWZAP| to allocate a _PT_Notified structure, and assigns the resulting pointer to |notified|. |PR_NEWZAP| wraps |PR_Calloc|, which may call |calloc|, which may return NULL. This allocation isn't checked so |notified| may contain NULL after line 325. This code is in a loop beginning at line 306. On the next iteration, the possible NULL |notified| is dereferenced at line 308.
Target Milestone: --- → 4.7
QA Contact: wtchang → nspr
wtc: this is scary, i don't see any way to recover...
Severity: minor → blocker
We've recently seen severalnull-pointer related crashes in pt_PostNotifyToCvar in automated Fedora crash reports: https://bugzilla.redhat.com/show_bug.cgi?id=544730 #4 pt_PostNotifyToCvar (cvar=0x0, broadcast=0) at ../../../mozilla/nsprpub/pr/src/pthreads/ptsynch.c:312 index = 0 notified = <value optimized out> #5 0x07152fb1 in PR_NotifyCondVar (cvar=<value optimized out>) at ../../../mozilla/nsprpub/pr/src/pthreads/ptsynch.c:445 No locals.
In the 3 recent crash reports the calls to PR_NotifyCondVar came in from ProcessReapedChildInternal which does only assert-based null checking.
I've seen the same crash reports in Komodo as well (running CentOS 5.3 and getting 1-2 crashes per day with this stack trace): Operating system: Linux 0.0.0 Linux 2.6.18-194.8.1.el5 #1 SMP Thu Jul 1 19:04:48 EDT 2010 x86_64 GNU/Linux CPU: x86 GenuineIntel family 10 model 15 stepping 6 4 CPUs Crash reason: SIGSEGV Crash address: 0xf7dc4246 Thread 15 (crashed) 0 libnspr4.so!pt_PostNotifyToCvar [ptsynch.c:56248d52ac25 : 312 + 0x0] eip = 0xf7dc4246 esp = 0xe824a140 ebp = 0xe824a168 ebx = 0xf7dd357c esi = 0x0ac24178 edi = 0x00000000 eax = 0x00000000 ecx = 0x000007f1 edx = 0x00000000 efl = 0x00010282 Found by: given as instruction pointer in context 1 libnspr4.so!PR_NotifyCondVar [ptsynch.c:56248d52ac25 : 445 + 0x7] eip = 0xf7dc4380 esp = 0xe824a170 ebp = 0xe824a178 Found by: previous frame's frame pointer 2 libnspr4.so!WaitPidDaemonThread [uxproces.c:56248d52ac25 : 553 + 0xa] eip = 0xf7dcd485 esp = 0xe824a180 ebp = 0xe824a398 Found by: previous frame's frame pointer 3 libnspr4.so!_pt_root [ptthread.c:56248d52ac25 : 228 + 0x8] eip = 0xf7dca91d esp = 0xe824a3a0 ebp = 0xe824a3b8 Found by: previous frame's frame pointer 4 libpthread-2.5.so + 0x5831 eip = 0x00300832 esp = 0xe824a3c0 ebp = 0xe824a4a8 Found by: previous frame's frame pointer Does anyone have an idea on what is causing this crash?
What could be the solution for this ? Make sure to check if it's still null at around. 323-325 , and loop until it's not null if memory's available (In reply to Kenneth Herron from comment #0) > This was found through a coverity scan of the firefox source code. Please > refer > to the sample URL. > > At lines 323-325, |pt_PostNotifyToCvar| checks |notified->link| for NULL, > calls |PR_NEWZAP| to allocate a _PT_Notified structure, and assigns the > resulting pointer to |notified|. |PR_NEWZAP| wraps |PR_Calloc|, which may > call |calloc|, which may return NULL. This allocation isn't checked so > |notified| may contain NULL after line 325. > > This code is in a loop beginning at line 306. On the next iteration, the > possible NULL |notified| is dereferenced at line 308.
Do you still see it? NPR current release seems to be at 4.8.3. I wonder if this is even valid anymore (In reply to Todd Whiteman from comment #4) > I've seen the same crash reports in Komodo as well (running CentOS 5.3 and > getting 1-2 crashes per day with this stack trace): > > Operating system: Linux > 0.0.0 Linux 2.6.18-194.8.1.el5 #1 SMP Thu Jul 1 19:04:48 > EDT > 2010 x86_64 GNU/Linux > CPU: x86 > GenuineIntel family 10 model 15 stepping 6 > 4 CPUs > > Crash reason: SIGSEGV > Crash address: 0xf7dc4246 > > Thread 15 (crashed) > 0 libnspr4.so!pt_PostNotifyToCvar [ptsynch.c:56248d52ac25 : 312 + 0x0] > eip = 0xf7dc4246 esp = 0xe824a140 ebp = 0xe824a168 ebx = 0xf7dd357c > esi = 0x0ac24178 edi = 0x00000000 eax = 0x00000000 ecx = 0x000007f1 > edx = 0x00000000 efl = 0x00010282 > Found by: given as instruction pointer in context > 1 libnspr4.so!PR_NotifyCondVar [ptsynch.c:56248d52ac25 : 445 + 0x7] > eip = 0xf7dc4380 esp = 0xe824a170 ebp = 0xe824a178 > Found by: previous frame's frame pointer > 2 libnspr4.so!WaitPidDaemonThread [uxproces.c:56248d52ac25 : 553 + 0xa] > eip = 0xf7dcd485 esp = 0xe824a180 ebp = 0xe824a398 > Found by: previous frame's frame pointer > 3 libnspr4.so!_pt_root [ptthread.c:56248d52ac25 : 228 + 0x8] > eip = 0xf7dca91d esp = 0xe824a3a0 ebp = 0xe824a3b8 > Found by: previous frame's frame pointer > 4 libpthread-2.5.so + 0x5831 > eip = 0x00300832 esp = 0xe824a3c0 ebp = 0xe824a4a8 > Found by: previous frame's frame pointer > > > Does anyone have an idea on what is causing this crash?
Yes, I still see this crash a lot of times (using Gecko 7 code base) - it's the top crash report for Komodo running on Linux (by a large margin). I cannot try a newer NSPR yet - perhaps in 6 months when we've updated Komodo to Gecko 17. Operating system: Linux 0.0.0 Linux 2.6.38-15-generic #64-Ubuntu SMP Fri Jul 6 18:51:28 UTC 2012 x86_64 CPU: amd64 family 6 model 30 stepping 5 4 CPUs Crash reason: SIGSEGV Crash address: 0x0 Thread 17 (crashed) 0 libnspr4.so!pt_PostNotifyToCvar [ptsynch.c:a6c276265898 : 312 + 0x0] rbx = 0x00000000 r12 = 0x00000000 r13 = 0xc3dfee40 r14 = 0x00000004 r15 = 0x00000007 rip = 0xe47e5207 rsp = 0xc3dfed80 rbp = 0xc3dfedc0 Found by: given as instruction pointer in context 1 libnspr4.so!PR_NotifyCondVar [ptsynch.c:a6c276265898 : 445 + 0x6] rbx = 0x00000000 r12 = 0xc3dfee5c r13 = 0xc3dfee40 r14 = 0x00000004 r15 = 0x00000007 rip = 0xe47e534b rsp = 0xc3dfeda0 rbp = 0xc3dfedc0 Found by: call frame info 2 libnspr4.so!WaitPidDaemonThread [uxproces.c:a6c276265898 : 680 + 0x11] rbx = 0x00000000 r12 = 0xc3dfee5c r13 = 0xc3dfee40 r14 = 0x00000004 r15 = 0x00000007 rip = 0xe47ed7f5 rsp = 0xc3dfedb0 rbp = 0xc3dfedc0 Found by: call frame info 3 libnspr4.so!_pt_root [ptthread.c:a6c276265898 : 187 + 0x6] rbx = 0xc6c35370 r12 = 0xbafb36f0 r13 = 0xc3dff9c0 r14 = 0x00000004 r15 = 0x00000007 rip = 0xe47eb5b3 rsp = 0xc3dfee90 rbp = 0x00000000 Found by: call frame info 4 libpthread-2.13.so + 0x6d8b rbx = 0x00000000 r12 = 0xbafb36f0 r13 = 0xc3dff9c0 r14 = 0x00000004 r15 = 0x00000007 rip = 0xe58fcd8c rsp = 0xc3dfeeb0 rbp = 0x00000000
I wonder if bug 745212 has some affect on this, as Komodo has an embedded Python, which is used to launch (thousands of) external processes... Python often uses "getpid" when checking the state of these processes.
We're seeing this as well on debian systems, using icedove and enigmail: http://bugs.debian.org/672860 http://bugs.debian.org/510589
Blocks: 908209
Here is a quick-and-dirty patch that makes the problem go away for me, certainly no root cause fix here, but thunderbird is no longer segfaulting.
Patrick, what call stack do you see in your crash? The same as reported earlier, or a different one? The original report warned about a potential allocation failure. If allocation fails, we'll fail immediately afterwards, inside the loop, on the next iteration, because the loop will immediately dereference "notified", which is NULL after a failed PR_NEWZAP allocation. This is different from the crash that Patrick tries to work around. The suggested patch from comment 10 adds a NULL check at the beginning at the function. If that helps, then the crash Patrick experiences cannot be caused by an allocation failure later in that function. The patch suggests to check (&cvar == NULL). I don't think that's a check you want. I think &cvar will never be NULL, because &cvar is the address where the function parameter is stored. You could try to check if the cvar parameter is null. Patrick, would you like to test, if instead of your patch, the following check, at the same location, is sufficient to stop your crash? if (cvar == NULL) {return;} If it is sufficient, then an invalid parameter is passed. If this helps, I'd like to see a call stack. To get it, you could replace the if check with PR_ASSERT(cvar != NULL). In the second part of your suggested patch, you check for (&cvar->lock == NULL). Again, I think what you really want is (cvar->lock == NULL). (cvar->lock) already is a pointer. I think that (&cvar->lock) will only be NULL if (cvar->lock) is NULL. Patrick, if if (cvar == NULL) {return;} is insufficient to fix the crash, please check if if (cvar == NULL || cvar->lock == NULL) {return;} equivalently fixes the crash as your suggested patch. Next, please try if if (cvar->lock == NULL) {return;} is sufficient to fix the crash.

The bug assignee didn't login in Bugzilla in the last 7 months, so the assignee is being reset.

Assignee: wtc → nobody

In the process of migrating remaining bugs to the new severity system, the severity for this bug cannot be automatically determined. Please retriage this bug using the new severity system.

Severity: blocker → --

The severity field is not set for this bug.
:KaiE, could you have a look please?

For more information, please visit BugBot documentation.

Flags: needinfo?(kaie)

Patrick didn't follow up to my questions 5.5 years ago.
I believe we don't have a way to reliably reproduce this crash manually?

Re-reading my comment from 2018, it seems I had concluded that the cause of the crash is unclear, and that the necessary fix is unclear.

I'd like to know if we still get these crashes. The debian bug report mentioned the crash most recently in 2017.

Flags: needinfo?(kaie)
Severity: -- → S3

Looks like we still do, yes. Albeit very low-volume.

Crash Signature: [@ pt_PostNotifyToCvar ]
Keywords: crash
See Also: → 1236012
Priority: -- → P1
Priority: P1 → P3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: