mozilla1.1a core dump at PR_AtomicDecrement()

RESOLVED FIXED

Status

MailNews Core
Composition
P2
critical
RESOLVED FIXED
16 years ago
10 years ago

People

(Reporter: Antonio.Xu, Assigned: Jean-Francois Ducarroz)

Tracking

({fixedOEM})

Trunk
x86
All
fixedOEM

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments, 1 obsolete attachment)

(Reporter)

Description

16 years ago
Mozilla1.1a build on Solaris 8 and window2000 core dump at PR_AtomicDecrement()
by the following operatioin,

1. Start Mozilla1.1a
2. Start Mail & News
3. Click Compose button multiple times
4. Close the invoked compose window immidiately

Try to 3-4 steps for several times

I'm seeing this problem even in C locale.

current thread: t@1
=>[1] PR_AtomicDecrement(0x18, 0x0, 0xffbedfc4, 0xfd8fcd41, 0x6, 0xffbee22c),
at 0xfea202fc
  [2] pt_PostNotifies(0xa9cce8, 0x1, 0xffbedfc4, 0xa9cce8, 0xfd746850, 0x1), at 
0xff052a9c
  [3] PR_Unlock(0xa9cce8, 0x0, 0x14, 0x2710, 0x128d2b8, 0x0), at 0xff052c30
  [4] nsWebShellWindow::FirePersistenceTimer(0x126dfb0, 0x11aaa80, 0xd2ce0ba9, 
0xd2cebd0b, 0x21, 0x13fd7c0), at 0xfdfc9138
  [5] nsTimerImpl::Fire(0x126dfb0, 0x1, 0x136f1f0, 0xfeb3801c, 0x62696e, 
0x66696c), at 0xff146c90
  [6] handleTimerEvent(0x133ea38, 0x0, 0x136f220, 0xfebd4840, 0xff140f78, 0x0), 
at 0xff146e08
  [7] PL_HandleEvent(0x133ea38, 0x208264, 0x0, 0xac970, 0xfee4cb30, 0x0), at 
0xff140f10
  [8] PL_ProcessEventsBeforeID(0x97d78, 0xb720, 0x13fcca8, 0xfeffcbdc, 
0xfee4cb30, 0x0), at 0xff1413f0
  [9] processQueue(0x97d78, 0xb720, 0xf, 0x203760, 0xfee4cb30, 0x80), at 
0xfdb3f8a8
  [10] nsVoidArray::EnumerateForwards(0xd3580, 0xfdb3f8a0, 0xb720, 0xd3580, 
0xfdb48b20, 0xfdb48688), at 0xff0fec98
  [11] handle_gdk_event(0x1329b80, 0x0, 0xfdb485d0, 0xfee4cb30, 0xfee4ef94, 
0x0), at 0xfdb48b20
  [12] gdk_event_dispatch(0x0, 0xffbeeb48, 0x0, 0x0, 0x179d40, 0x0), at 
0xfefd6bc4
  [13] g_main_dispatch(0x0, 0xfee4efc0, 0xfee4efbc, 0xfee4efc4, 0xff3e2660, 
0xfee93b57), at 0xfee26e20
  [14] g_main_iterate(0x1, 0x1, 0x1, 0xfee4ef30, 0x1, 0xfee28138), at 0xfee27634
  [15] g_main_run(0x1df798, 0x0, 0xff14312c, 0xfdb3f3d0, 0x0, 0x0), at 
0xfee27870
  [16] gtk_main(0x0, 0x6a480, 0xffbeec6c, 0x80000000, 0xff1cce44, 0xff1cd3e4), 
at 0xfef0ff30
  [17] nsAppShell::Run(0x1810f0, 0x1810f0, 0xfdb3f584, 0x1a3f8, 0xfdfe4373, 
0x0), at 0xfdb3f5c4
  [18] main1(0x80000000, 0x21f65, 0xff1cda1c, 0xff1cd85c, 0x0, 0x800), at 
0x1a43c
  [19] main(0x6, 0xffbeeef4, 0xffbeef10, 0x6a000, 0x0, 0x0), at 0x1afc0
(dbx) quit
(Reporter)

Updated

16 years ago
Priority: -- → P2

Comment 1

16 years ago
Created attachment 89507 [details]
stacktrace with full symbols

slightly different stacktrace from linux debug build

(gdb) frame 1
#1  0x4026eca7 in PR_Lock (lock=0xdadadada) at ptsynch.c:190
190	    rv = pthread_mutex_lock(&lock->mutex);
Current language:  auto; currently c
(gdb) p lock
$1 = (PRLock *) 0xdadadada
(gdb) p lock->mutex
Cannot access memory at address 0xdadadada


another debug run (not under gdb) gave the following assertion before crashing:

Assertion failure: 0 == rv, at ptsynch.c:191

Comment 2

16 years ago
OS=>All
OS: Windows 2000 → All
(Reporter)

Comment 3

16 years ago
Created attachment 90176 [details] [diff] [review]
patch version 1.00,please r=? & sr=?

My patch can fix this bug.  I found it problem was due to when compose windows
has been destroied, the Timer still be fired for run function
"nsWebShellWindow::FirePersistenceTimer", this funtion will try to use the
member of the destroied win object, so it will make mozilla crash. I found when
the ~nsWebShellWindow will be executed, it will judge whether the mSPTimer was
equal to null, if mSPTimer is unequal to null, it will cancel all timer. But
some time win object will set timer a lot times, so when the first fire been
executed, it will set mSPTimer equal to null, then if we close the compose
windows and detroy win object, it won't cancel timer due to mSPTimer equal to
null, so when another fire been executed,it will let mozilla crash.  I think if
we set mSPTimer equal to null everytime when
nsWebShellWindow::FirePersistenceTimer has been runned. it will let win object
create new timer object for himself when "void
nsWebShellWindow::SetPersistenceTimer" been runned again. So I think my fix is
good for this problem. 
Please r=? & sr=?
(Reporter)

Comment 4

16 years ago
I have researched timer,I think timer maybe have problem. I found when we init 
a timer, timer will be added in TimerThread::mTimers. Then timer will be 
removed from TimerThread::mTimers and released in TimerThread::Run(), then 
timer will use timer::PostTimerEvent() for pass himself to nsTimerManager, it 
will try to fire it. So I think the problem is how to judge a timer has been 
runned.  When timer was removed from TimerThread::mTimers in TimerThread::Run
(),it is means timer has been runned. So if we think that is right,Timer 
shouldn't have problem.  But if we think timer's fire function has been runned 
is means timer has been runned. We should suppress timer fired, if timer has 
been removed from TimerThread::mTimers in TimerThread::Run for running, before 
running nsTimerImpl::SetDelay. But I think the most important question is how 
to judge timer has been runned. The anwser are "when timer was removed from 
TimerThread::mTimers in TimerThread::Run()" or "when timer's fire function has 
been runned".  I think the firse anwser is good, if we choice the second answer 
we will miss some event.
if we think the second anwser is good, we should add some code like this

void
nsWebShellWindow::SetPersistenceTimer(PRBool aSize, PRBool aPosition, PRBool 
aMode)
{
  PR_Lock(mSPTimerLock);
  if (mSPTimer) {
+   mSPTimer->Cancel();    
    mSPTimer->SetDelay(SIZE_PERSISTENCE_TIMEOUT);
(Reporter)

Comment 5

16 years ago
please r=? my patch

Comment 6

16 years ago
I'd rather that Pavlov r= this
(Reporter)

Comment 7

16 years ago
Created attachment 90470 [details] [diff] [review]
patch version 1.01,please r=? & sr=?

Change some code according to bryner's advice.please r=? & sr=?
Thank you
Attachment #90176 - Attachment is obsolete: true
Comment on attachment 90470 [details] [diff] [review]
patch version 1.01,please r=? & sr=?

Ok, looks good to me, but I think the underlying timer issue should be
investigated as well (that being that calling SetDelay on a timer that has not
yet fired can make it fire twice).

r=bryner
Attachment #90470 - Flags: review+
Comment on attachment 90470 [details] [diff] [review]
patch version 1.01,please r=? & sr=?

sr=jst
Attachment #90470 - Flags: superreview+

Comment 10

16 years ago
Comment on attachment 90470 [details] [diff] [review]
patch version 1.01,please r=? & sr=?

a=asa (on behalf of drivers) for checkin to the 1.1 trunk.
Attachment #90470 - Flags: approval+

Comment 11

16 years ago
checked in
Status: NEW → RESOLVED
Last Resolved: 16 years ago
Resolution: --- → FIXED

Updated

16 years ago
Whiteboard: branchOEM

Updated

16 years ago
Whiteboard: branchOEM → branchOEM+

Comment 12

16 years ago
checked in NETSCAPE_7_0_OEM_BRANCH (a=jdunn)
Whiteboard: branchOEM+ → fixedOEM

Updated

16 years ago
Whiteboard: fixedOEM → branchOEM+ fixedOEM

Updated

16 years ago
Keywords: fixedOEM
Whiteboard: branchOEM+ fixedOEM

Comment 13

15 years ago
Using trunk builds 200211-25 on winxp and linux and macosx, I crash trying this
scenario.  Not sure if it's the same crash but here is the winxp talkback.
nsComposerCommandsUpdater::SelectionIsCollapsed
[c:/builds/seamonkey/mozilla/editor/composer/src/nsComposerCommandsUpdater.cpp,
line 367]
nsComposerCommandsUpdater::TimerCallback
[c:/builds/seamonkey/mozilla/editor/composer/src/nsComposerCommandsUpdater.cpp,
line 265]
nsComposerCommandsUpdater::Notify
[c:/builds/seamonkey/mozilla/editor/composer/src/nsComposerCommandsUpdater.cpp,
line 383]
nsTimerImpl::Fire [c:/builds/seamonkey/mozilla/xpcom/threads/nsTimerImpl.cpp,
line 380]
nsAppShell::Run [c:/builds/seamonkey/mozilla/widget/src/windows/nsAppShell.cpp,
line 177]
nsAppShellService::Run
[c:/builds/seamonkey/mozilla/xpfe/appshell/src/nsAppShellService.cpp, line 472]
main1 [c:/builds/seamonkey/mozilla/xpfe/bootstrap/nsAppRunner.cpp, line 1557]
main [c:/builds/seamonkey/mozilla/xpfe/bootstrap/nsAppRunner.cpp, line 1905]
WinMain [c:/builds/seamonkey/mozilla/xpfe/bootstrap/nsAppRunner.cpp, line 1925]
WinMainCRTStartup()
kernel32.dll + 0x217c7 (0x77e817c7) 

and here is all the linux talkback has in the report:
SIGSEGV: Segmentation Fault: (signal 11)

Is this a different bug or the same as this one.

to reproduce, follow the steps in the original scenario, reopening until someone
can tell me if this is the same bug.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Product: MailNews → Core

Comment 14

13 years ago
I can't reproduce this and lack of comments suggests nobody else can either.

reresolving FIXED since a patch went in.
Status: REOPENED → RESOLVED
Last Resolved: 16 years ago13 years ago
Resolution: --- → FIXED
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.