Closed
Bug 134070
Opened 23 years ago
Closed 23 years ago
Crash dismissing dialogs [@ _pl_GetEventCount]
Categories
(Core :: XPCOM, defect)
Tracking
()
VERIFIED
FIXED
mozilla1.0
People
(Reporter: bryner, Assigned: bryner)
References
Details
(Keywords: crash, topcrash, Whiteboard: [ADT])
Crash Data
Attachments
(1 file, 3 obsolete files)
1.31 KB,
patch
|
bryner
:
review+
bryner
:
superreview+
asa
:
approval+
|
Details | Diff | Splinter Review |
On my debug build on Linux (current trunk), I'm seeing certain dialogs (such as
the mail password dialog or the "Incorrect password" dialog) cause a crash as
the dialog is dismissed. This doesn't happen every time, but does happen 10% of
the time for me. The following message is printed:
pthread_mutex_lock failed: rv = 22
Assertion failure: 0 == rv, at
/builds/test/mozilla/nsprpub/pr/src/pthreads/ptsynch.c:186
/builds/test/mozilla/obj-dbg/dist/bin/run-mozilla.sh: line 72: 12678 Aborted
Further investigation shows that pthread_mutex_lock returns EINVAL, because the
lock itself is no longer valid, because the PLEventQueue has already been
destroyed. However, it is being told to process events from
event_processor_callback in widget/src/gtk/nsAppShell.cpp.
Assignee | ||
Comment 1•23 years ago
|
||
I believe this crash will occur on debug builds due to the definition of PR_ASSERT.
Keywords: crash
I see this on Mandrake 8.1 with the commercial 2002-03-29-11 build:
_pl_GetEventCount()
PL_ProcessPendingEvents()
nsEventQueueImpl::ProcessPendingEvents()
event_processor_callback()
our_gdk_io_invoke()
libglib-1.2.so.0 + 0xfbd4 (0x40393bd4)
libglib-1.2.so.0 + 0x11390 (0x40395390)
libglib-1.2.so.0 + 0x1196f (0x4039596f)
libglib-1.2.so.0 + 0x11b2b (0x40395b2b)
libgtk-1.2.so.0 + 0x986e3 (0x402a56e3)
nsAppShell::Run()
nsAppShellService::Run()
netscape-bin + 0x84e9 (0x080504e9)
netscape-bin + 0x8cc7 (0x08050cc7)
libc.so.6 + 0x1c5b0 (0x404e95b0)
Keywords: nsbeta1
Keywords: nsbeta1
Assignee | ||
Comment 3•23 years ago
|
||
Here's a patch which should fix the problem (or at least the manifestation of
it that I was seeing). stephend, can you verify if this fixes your problem as
well? We also need to check whether this causes any slowdown of startup/new
window/page load.
Summary: Crash dismissing dialogs → Crash dismissing dialogs [@ _pl_GetEventCount()]
Assignee | ||
Comment 4•23 years ago
|
||
Actually, use this one instead.
Attachment #76942 -
Attachment is obsolete: true
Comment 5•23 years ago
|
||
Brian,
How is the nsEventQueue object being accessed from a thread other than the
thread that owns the queue? In other words, how would PL_IsQueueOnCurrentThread
be false if called from CheckForDeactivation()?
Assignee | ||
Comment 6•23 years ago
|
||
Doug, it's being accessed from the timer thread. This creates a problem because
we then try to unhook glib's IO channel for the queue on the timer thread, and
if glib is in the middle of dispatching an event for that queue (on the main
thread), we end up accessing a deleted PLEventQueue.
Comment 7•23 years ago
|
||
Can the check for thread access be higher than CheckForDeactivation()?
Comment 8•23 years ago
|
||
Comment on attachment 76972 [details] [diff] [review]
patch #2
r=dougt
Attachment #76972 -
Flags: review+
Assignee | ||
Updated•23 years ago
|
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.0
Comment 10•23 years ago
|
||
Attachment #76972 -
Flags: superreview+
Comment 11•23 years ago
|
||
*** Bug 134361 has been marked as a duplicate of this bug. ***
Comment 12•23 years ago
|
||
Is Bug 132597 a dup of this? The stack signature is the same...but that bug
seems to be about logging into mail. There might be a relation, but I'm not
sure. Can't hurt to take a look though.
Adding topcrash since this is currently the #2 topcrash on the Trunk.
Keywords: topcrash
Updated•23 years ago
|
Summary: Crash dismissing dialogs [@ _pl_GetEventCount()] → Crash dismissing dialogs [@ _pl_GetEventCount]
Comment 13•23 years ago
|
||
Adding saari, gordon and joki to make sure this patch doesn't introduce a leak.
Assignee | ||
Comment 14•23 years ago
|
||
darin, can you think of any way that your checkin might have caused this?
Comment 15•23 years ago
|
||
Comment on attachment 76972 [details] [diff] [review]
patch #2
a=asa (on behalf of drivers) for checkin to the 1.0 trunk
Attachment #76972 -
Flags: approval+
Jay, it's the same. Logging into mail requires typing a password (unless, of
course, you have it saved).
*** Bug 132597 has been marked as a duplicate of this bug. ***
Assignee | ||
Comment 18•23 years ago
|
||
Darin, I'm not able to reproduce this with your patch for bug 128508
(nsEventQueueService.cpp, rev 1.34) backed out.
Comment 19•23 years ago
|
||
bryner: ok, good... i think. now, to figure out why that is causing this problem.
Comment 20•23 years ago
|
||
ok, nsEventQueue::CheckForDeactivation is only called from the following methods:
StopAcceptingEvents
ProcessPendingEvents
GetEvent
WaitForEvent
GetYoungestActive
the only method being called from nsTimerImpl is GetYoungestActive. it is being
called indirectly via nsEventQueueService::GetThreadEventQueue (because of my
patch).
it looks like GetYoungestActive was not designed to be called from a background
thread. (i'm basing that on the fact that CheckForDeactivation probably wasn't
meant to be called from a background thread.)
bryner's patch probably nails this problem shut. another solution would be to
not call CheckForDeactivation from GetYoungestActive. a quick LXR search
reveals that GetThreadEventQueue is the only method that'll lead to
GetYoungestActive being called, and before my patch it was only calling
GetYoungest, which didn't call CheckForDeactivation. in other words, prior to
my patch, nsEventQueue::GetYoungestActive was never being executed. so, why
does it need to call CheckForDeactivation?
i am slightly worried that we aren't protected against SetYounger(nsnull) being
called while inside GetYoungestActive. is there anything to guarantee that that
won't happen? the same potential problem existed when we were calling
GetYoungest, so maybe it is protected against somehow?!
Comment 21•23 years ago
|
||
GetYoungestActive must be able to be called from *any* thread. This is so that
thread B can find out what event queue on thread A should be used for posting.
Why CheckForDeactivation in GetYoungestActive? It would seam that this check
should be only called after events are processed (in ProcessPendingEvents,
GetEvent). The problem is that these calls know nothing about nested events queues.
Comment 22•23 years ago
|
||
via IRC, timeless mentioned seeing the following crash:
<timeless> fwiw, i've experience a crash near getyoungestactive, it's really odd
because msvs claims that if(foo[==0]) foo->get... (...) // <- is called even
though foo is listed as null
<darin> ah.. so you are seeing the race condition i was afraid might exist
where |foo| is mYoungerQueue (i think).
Comment 23•23 years ago
|
||
dougt: i agree.. GetYoungestActive must be callable from any thread, and it
sounds like you agree that GetYoungestActive shouldn't be calling
CheckForDeactivation?!
is there any reason not to remove it?
Comment 24•23 years ago
|
||
yes. risk of a leak. I feel much more comfortable if we just go with patch #2
this late in the game. For the trunk, we can rip it out and have more bake time.
Comment 25•23 years ago
|
||
uhm... but before my recent checkin, we were calling GetYoungest instead of
GetYoungestActive. IOW, we weren't calling CheckForDeactivation. so, why call
it now? where's the risk?
Comment 26•23 years ago
|
||
ignore me. what darin said. we should be okay to remove CheckForDeactivation
from GetYoungestActive.
I would like danm to review this change also. Dan, are we hosing ourselves by
removing this check from GetYoungestActive?
Comment 27•23 years ago
|
||
Removing adt1.0.0. Pls renominate when you have all reviews.
Keywords: adt1.0.0
Whiteboard: [ADT]
Comment 28•23 years ago
|
||
My take on why Darin's patch for bug 99026 exposed this bug: my original
impression scores one, even though I couldn't defend it. I think that during a
single pass through ProcessPendingEvents, another event gets pushed onto the
youngest queue and then the queue is inactivated. With the current version of
GetYoungestEventQueue, a subsequent pass through PPE won't pick up that event. I
remember being able to shoot down every such possible scenario when I reviewed
that patch, but I think we must have missed this (simple) one.
So I'm still concerned that an event is going unprocessed.
However, I agree with y'all; I think we won't introduce any leaks by removing
CheckForDeactivation altogether from GetYoungestActive, and that's the best
thing to do here. Meanwhile it probably wouldn't hurt to also add Brian's
PL_IsQueueOnCurrentThread patch. (I suppose Brian gets to write the combined
patch and run it through process now?)
Comment 29•23 years ago
|
||
*** Bug 134702 has been marked as a duplicate of this bug. ***
Comment 30•23 years ago
|
||
>Meanwhile it probably wouldn't hurt to also add Brian's
PL_IsQueueOnCurrentThread patch. (I suppose Brian gets to write the combined
patch and run it through process now?)
Be sure to assert in debug code in this case!
Comment 31•23 years ago
|
||
*** Bug 135296 has been marked as a duplicate of this bug. ***
Comment 32•23 years ago
|
||
*** Bug 135224 has been marked as a duplicate of this bug. ***
Assignee | ||
Comment 33•23 years ago
|
||
I think this does everything dougt is suggesting. Can I get r=/sr=?
Attachment #76972 -
Attachment is obsolete: true
Comment 34•23 years ago
|
||
Comment on attachment 77631 [details] [diff] [review]
new patch
i thought dougt was saying that an assertion should be enough, but perhaps it's
best to beef this up anyways. how about using NS_ERROR instead of
NS_ASSERTION?
either way, sr=darin
Attachment #77631 -
Flags: superreview+
Comment 35•23 years ago
|
||
Comment on attachment 77631 [details] [diff] [review]
new patch
I would have made it so that on debug build PL_IsQueueOnCurrentThread isn't
called twice.
In either case r=dougt
Attachment #77631 -
Flags: review+
Assignee | ||
Comment 36•23 years ago
|
||
Implemented everyone's suggestions.
Attachment #77631 -
Attachment is obsolete: true
Assignee | ||
Comment 37•23 years ago
|
||
Comment on attachment 77728 [details] [diff] [review]
final patch
carrying over r=dougt, sr=darin.
Attachment #77728 -
Flags: superreview+
Attachment #77728 -
Flags: review+
Comment 38•23 years ago
|
||
Comment on attachment 77728 [details] [diff] [review]
final patch
a=asa (on behalf of drivers) for checkin to the 1.0 trunk
Attachment #77728 -
Flags: approval+
Comment 40•23 years ago
|
||
I'm seeing this with increased frequency using apr5 commercial trunk build,
linux rh6.2 -- happens every single time I login (using correct password).
Assignee | ||
Comment 41•23 years ago
|
||
Fix is checked in.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Comment 42•23 years ago
|
||
*** Bug 136939 has been marked as a duplicate of this bug. ***
Comment 43•23 years ago
|
||
If it's checked in, it's back. as of the 2002041003 trunk builds.
Assignee | ||
Comment 44•23 years ago
|
||
Do you have a talkback ID of a crash dismissing a dialog from that build?
Comment 45•23 years ago
|
||
I don't see any crashes with this stack signature in talkback data for 4/10
builds...so if Charles did crash dismissing dialogs it must have a different
stack signature. I searched by Charles email address and none of his crashes
seem to be related to this bug.
Charles: If you are able to reproduce this and get a Talkback incident, please
let us know. Your crash will most likely need to be logged as a new bug.
Verifying fixed based on Talkback data.
Status: RESOLVED → VERIFIED
Comment 46•23 years ago
|
||
Incident ID 5090976
Build 2002041003
What:
Open mail client from browser, enter password, hit enter key, and crash.
Happened using three different user profiles/mail accounts.
Email me if you need specific configuration info (server/account/etc)
Comment 47•23 years ago
|
||
Charles crash looks different:
Incident ID 5090976
Stack Signature PR_EnterMonitor aef27f05
Trigger Time 2002-04-11 11:29:42
Email Address carosendahl@netscape.com
URL visited
Build ID 2002041011
Product ID MozillaTrunk
Platform
Operating System Win32
Module
Trigger Reason Access violation
User Comments Tried to access mail client
Stack Trace
PR_EnterMonitor [../../../../pr/src/threads/prmon.c, line 93]
PL_ProcessPendingEvents [d:\builds\seamonkey\mozilla\xpcom\threads\plevent.c,
line 505]
USER32.DLL + 0x1b60 (0x77e11b60)
0x5e204689
From the stack, this crash points to bug 119693. Leaving this one verified fixed.
Updated•14 years ago
|
Crash Signature: [@ _pl_GetEventCount]
You need to log in
before you can comment on or make changes to this bug.
Description
•