Closed Bug 134070 Opened 23 years ago Closed 23 years ago

Crash dismissing dialogs [@ _pl_GetEventCount]

Tracking

()

Status:

VERIFIED FIXED

Milestone:

mozilla1.0

People

(Reporter: bryner, Assigned: bryner)

References

Details

(Keywords: crash, topcrash, Whiteboard: [ADT])

Crash Data

Attachments

(1 file, 3 obsolete files)

patch 23 years ago Brian Ryner (not reading) 640 bytes, patch		Details \| Diff \| Splinter Review
patch #2 23 years ago Brian Ryner (not reading) 737 bytes, patch	dougt : review+ rpotts : superreview+ asa : approval+	Details \| Diff \| Splinter Review
new patch 23 years ago Brian Ryner (not reading) 1.35 KB, patch	dougt : review+ darin.moz : superreview+	Details \| Diff \| Splinter Review
final patch 23 years ago Brian Ryner (not reading) 1.31 KB, patch	bryner : review+ bryner : superreview+ asa : approval+	Details \| Diff \| Splinter Review

Brian Ryner (not reading)

Assignee

Description

•

23 years ago

On my debug build on Linux (current trunk), I'm seeing certain dialogs (such as the mail password dialog or the "Incorrect password" dialog) cause a crash as the dialog is dismissed. This doesn't happen every time, but does happen 10% of the time for me. The following message is printed: pthread_mutex_lock failed: rv = 22 Assertion failure: 0 == rv, at /builds/test/mozilla/nsprpub/pr/src/pthreads/ptsynch.c:186 /builds/test/mozilla/obj-dbg/dist/bin/run-mozilla.sh: line 72: 12678 Aborted Further investigation shows that pthread_mutex_lock returns EINVAL, because the lock itself is no longer valid, because the PLEventQueue has already been destroyed. However, it is being told to process events from event_processor_callback in widget/src/gtk/nsAppShell.cpp.

Brian Ryner (not reading)

Assignee

Comment 1

•

23 years ago

I believe this crash will occur on debug builds due to the definition of PR_ASSERT.

Keywords: crash

Brian Ryner (not reading)

Assignee

Updated

•

23 years ago

Keywords: nsbeta1

stephend@netscape.com (gone - use stephen.donner@gmail.com instead)

Comment 2

•

23 years ago

I see this on Mandrake 8.1 with the commercial 2002-03-29-11 build: _pl_GetEventCount() PL_ProcessPendingEvents() nsEventQueueImpl::ProcessPendingEvents() event_processor_callback() our_gdk_io_invoke() libglib-1.2.so.0 + 0xfbd4 (0x40393bd4) libglib-1.2.so.0 + 0x11390 (0x40395390) libglib-1.2.so.0 + 0x1196f (0x4039596f) libglib-1.2.so.0 + 0x11b2b (0x40395b2b) libgtk-1.2.so.0 + 0x986e3 (0x402a56e3) nsAppShell::Run() nsAppShellService::Run() netscape-bin + 0x84e9 (0x080504e9) netscape-bin + 0x8cc7 (0x08050cc7) libc.so.6 + 0x1c5b0 (0x404e95b0)

Keywords: nsbeta1

stephend@netscape.com (gone - use stephen.donner@gmail.com instead)

Updated

•

23 years ago

Keywords: nsbeta1

Brian Ryner (not reading)

Assignee

Comment 3

•

23 years ago

Attached patch patch (obsolete) — Details — Splinter Review

Here's a patch which should fix the problem (or at least the manifestation of it that I was seeing). stephend, can you verify if this fixes your problem as well? We also need to check whether this causes any slowdown of startup/new window/page load.

stephend@netscape.com (gone - use stephen.donner@gmail.com instead)

Updated

•

23 years ago

Summary: Crash dismissing dialogs → Crash dismissing dialogs [@ _pl_GetEventCount()]

Brian Ryner (not reading)

Assignee

Comment 4

•

23 years ago

Attached patch patch #2 (obsolete) — Details — Splinter Review

Actually, use this one instead.

Attachment #76942 - Attachment is obsolete: true

Doug Turner (:dougt)

Comment 5

•

23 years ago

Brian, How is the nsEventQueue object being accessed from a thread other than the thread that owns the queue? In other words, how would PL_IsQueueOnCurrentThread be false if called from CheckForDeactivation()?

Brian Ryner (not reading)

Assignee

Comment 6

•

23 years ago

Doug, it's being accessed from the timer thread. This creates a problem because we then try to unhook glib's IO channel for the queue on the timer thread, and if glib is in the middle of dispatching an event for that queue (on the main thread), we end up accessing a deleted PLEventQueue.

Doug Turner (:dougt)

Comment 7

•

23 years ago

Can the check for thread access be higher than CheckForDeactivation()?

Gagan

Updated

•

23 years ago

Keywords: nsbeta1 → nsbeta1+

Doug Turner (:dougt)

Comment 8

•

23 years ago

Comment on attachment 76972 [details] [diff] [review] patch #2 r=dougt

Attachment #76972 - Flags: review+

Brian Ryner (not reading)

Assignee

Comment 9

•

23 years ago

taking

Assignee: dougt → bryner

Brian Ryner (not reading)

Assignee

Updated

•

23 years ago

Status: NEW → ASSIGNED

Target Milestone: --- → mozilla1.0

rpotts (gone)

Comment 10

•

23 years ago

Comment on attachment 76972 [details] [diff] [review] patch #2 sr=rpotts@netscape.com

Attachment #76972 - Flags: superreview+

Brian Ryner (not reading)

Assignee

Updated

•

23 years ago

Keywords: adt1.0.0

Doug Turner (:dougt)

Comment 11

•

23 years ago

*** Bug 134361 has been marked as a duplicate of this bug. ***

Jay Patel [:jay]

Comment 12

•

23 years ago

Is Bug 132597 a dup of this? The stack signature is the same...but that bug seems to be about logging into mail. There might be a relation, but I'm not sure. Can't hurt to take a look though. Adding topcrash since this is currently the #2 topcrash on the Trunk.

Keywords: topcrash

Jay Patel [:jay]

Updated

•

23 years ago

Summary: Crash dismissing dialogs [@ _pl_GetEventCount()] → Crash dismissing dialogs [@ _pl_GetEventCount]

Jaime Rodriguez, Jr.

Comment 13

•

23 years ago

Adding saari, gordon and joki to make sure this patch doesn't introduce a leak.

Brian Ryner (not reading)

Assignee

Comment 14

•

23 years ago

darin, can you think of any way that your checkin might have caused this?

Asa Dotzler [:asa]

Comment 15

•

23 years ago

Comment on attachment 76972 [details] [diff] [review] patch #2 a=asa (on behalf of drivers) for checkin to the 1.0 trunk

Attachment #76972 - Flags: approval+

stephend@netscape.com (gone - use stephen.donner@gmail.com instead)

Comment 16

•

23 years ago

Jay, it's the same. Logging into mail requires typing a password (unless, of course, you have it saved).

stephend@netscape.com (gone - use stephen.donner@gmail.com instead)

Comment 17

•

23 years ago

*** Bug 132597 has been marked as a duplicate of this bug. ***

Brian Ryner (not reading)

Assignee

Comment 18

•

23 years ago

Darin, I'm not able to reproduce this with your patch for bug 128508 (nsEventQueueService.cpp, rev 1.34) backed out.

Darin Fisher

Comment 19

•

23 years ago

bryner: ok, good... i think. now, to figure out why that is causing this problem.

Darin Fisher

Comment 20

•

23 years ago

ok, nsEventQueue::CheckForDeactivation is only called from the following methods: StopAcceptingEvents ProcessPendingEvents GetEvent WaitForEvent GetYoungestActive the only method being called from nsTimerImpl is GetYoungestActive. it is being called indirectly via nsEventQueueService::GetThreadEventQueue (because of my patch). it looks like GetYoungestActive was not designed to be called from a background thread. (i'm basing that on the fact that CheckForDeactivation probably wasn't meant to be called from a background thread.) bryner's patch probably nails this problem shut. another solution would be to not call CheckForDeactivation from GetYoungestActive. a quick LXR search reveals that GetThreadEventQueue is the only method that'll lead to GetYoungestActive being called, and before my patch it was only calling GetYoungest, which didn't call CheckForDeactivation. in other words, prior to my patch, nsEventQueue::GetYoungestActive was never being executed. so, why does it need to call CheckForDeactivation? i am slightly worried that we aren't protected against SetYounger(nsnull) being called while inside GetYoungestActive. is there anything to guarantee that that won't happen? the same potential problem existed when we were calling GetYoungest, so maybe it is protected against somehow?!

Doug Turner (:dougt)

Comment 21

•

23 years ago

GetYoungestActive must be able to be called from *any* thread. This is so that thread B can find out what event queue on thread A should be used for posting. Why CheckForDeactivation in GetYoungestActive? It would seam that this check should be only called after events are processed (in ProcessPendingEvents, GetEvent). The problem is that these calls know nothing about nested events queues.

Darin Fisher

Comment 22

•

23 years ago

via IRC, timeless mentioned seeing the following crash: <timeless> fwiw, i've experience a crash near getyoungestactive, it's really odd because msvs claims that if(foo[==0]) foo->get... (...) // <- is called even though foo is listed as null <darin> ah.. so you are seeing the race condition i was afraid might exist where |foo| is mYoungerQueue (i think).

Darin Fisher

Comment 23

•

23 years ago

dougt: i agree.. GetYoungestActive must be callable from any thread, and it sounds like you agree that GetYoungestActive shouldn't be calling CheckForDeactivation?! is there any reason not to remove it?

Doug Turner (:dougt)

Comment 24

•

23 years ago

yes. risk of a leak. I feel much more comfortable if we just go with patch #2 this late in the game. For the trunk, we can rip it out and have more bake time.

Darin Fisher

Comment 25

•

23 years ago

uhm... but before my recent checkin, we were calling GetYoungest instead of GetYoungestActive. IOW, we weren't calling CheckForDeactivation. so, why call it now? where's the risk?

Doug Turner (:dougt)

Comment 26

•

23 years ago

ignore me. what darin said. we should be okay to remove CheckForDeactivation from GetYoungestActive. I would like danm to review this change also. Dan, are we hosing ourselves by removing this check from GetYoungestActive?

Jaime Rodriguez, Jr.

Comment 27

•

23 years ago

Removing adt1.0.0. Pls renominate when you have all reviews.

Keywords: adt1.0.0

Whiteboard: [ADT]

Dan M

Comment 28

•

23 years ago

My take on why Darin's patch for bug 99026 exposed this bug: my original impression scores one, even though I couldn't defend it. I think that during a single pass through ProcessPendingEvents, another event gets pushed onto the youngest queue and then the queue is inactivated. With the current version of GetYoungestEventQueue, a subsequent pass through PPE won't pick up that event. I remember being able to shoot down every such possible scenario when I reviewed that patch, but I think we must have missed this (simple) one. So I'm still concerned that an event is going unprocessed. However, I agree with y'all; I think we won't introduce any leaks by removing CheckForDeactivation altogether from GetYoungestActive, and that's the best thing to do here. Meanwhile it probably wouldn't hurt to also add Brian's PL_IsQueueOnCurrentThread patch. (I suppose Brian gets to write the combined patch and run it through process now?)

grylchan

Comment 29

•

23 years ago

*** Bug 134702 has been marked as a duplicate of this bug. ***

Doug Turner (:dougt)

Comment 30

•

23 years ago

>Meanwhile it probably wouldn't hurt to also add Brian's PL_IsQueueOnCurrentThread patch. (I suppose Brian gets to write the combined patch and run it through process now?) Be sure to assert in debug code in this case!

Matthias Versen [:Matti]

Comment 31

•

23 years ago

*** Bug 135296 has been marked as a duplicate of this bug. ***

Matthias Versen [:Matti]

Comment 32

•

23 years ago

*** Bug 135224 has been marked as a duplicate of this bug. ***

Brian Ryner (not reading)

Assignee

Comment 33

•

23 years ago

Attached patch new patch (obsolete) — Details — Splinter Review

I think this does everything dougt is suggesting. Can I get r=/sr=?

Attachment #76972 - Attachment is obsolete: true

Darin Fisher

Comment 34

•

23 years ago

Comment on attachment 77631 [details] [diff] [review] new patch i thought dougt was saying that an assertion should be enough, but perhaps it's best to beef this up anyways. how about using NS_ERROR instead of NS_ASSERTION? either way, sr=darin

Attachment #77631 - Flags: superreview+

Doug Turner (:dougt)

Comment 35

•

23 years ago

Comment on attachment 77631 [details] [diff] [review] new patch I would have made it so that on debug build PL_IsQueueOnCurrentThread isn't called twice. In either case r=dougt

Attachment #77631 - Flags: review+

Brian Ryner (not reading)

Assignee

Comment 36

•

23 years ago

Attached patch final patch — Details — Splinter Review

Implemented everyone's suggestions.

Attachment #77631 - Attachment is obsolete: true

Brian Ryner (not reading)

Assignee

Comment 37

•

23 years ago

Comment on attachment 77728 [details] [diff] [review] final patch carrying over r=dougt, sr=darin.

Attachment #77728 - Flags: superreview+

Attachment #77728 - Flags: review+

Asa Dotzler [:asa]

Comment 38

•

23 years ago

Comment on attachment 77728 [details] [diff] [review] final patch a=asa (on behalf of drivers) for checkin to the 1.0 trunk

Attachment #77728 - Flags: approval+

scottputterman

Comment 39

•

23 years ago

adding adt1.0.0+ on behalf of the adt.

Keywords: adt1.0.0+

laurel

Comment 40

•

23 years ago

I'm seeing this with increased frequency using apr5 commercial trunk build, linux rh6.2 -- happens every single time I login (using correct password).

Brian Ryner (not reading)

Assignee

Comment 41

•

23 years ago

Fix is checked in.

Status: ASSIGNED → RESOLVED

Closed: 23 years ago

Resolution: --- → FIXED

laurel

Comment 42

•

23 years ago

*** Bug 136939 has been marked as a duplicate of this bug. ***

Charles Rosendahl

Comment 43

•

23 years ago

If it's checked in, it's back. as of the 2002041003 trunk builds.

Brian Ryner (not reading)

Assignee

Comment 44

•

23 years ago

Do you have a talkback ID of a crash dismissing a dialog from that build?

Jay Patel [:jay]

Comment 45

•

23 years ago

I don't see any crashes with this stack signature in talkback data for 4/10 builds...so if Charles did crash dismissing dialogs it must have a different stack signature. I searched by Charles email address and none of his crashes seem to be related to this bug. Charles: If you are able to reproduce this and get a Talkback incident, please let us know. Your crash will most likely need to be logged as a new bug. Verifying fixed based on Talkback data.

Status: RESOLVED → VERIFIED

Charles Rosendahl

Comment 46

•

23 years ago

Incident ID 5090976 Build 2002041003 What: Open mail client from browser, enter password, hit enter key, and crash. Happened using three different user profiles/mail accounts. Email me if you need specific configuration info (server/account/etc)

Jay Patel [:jay]

Comment 47

•

23 years ago

Charles crash looks different: Incident ID 5090976 Stack Signature PR_EnterMonitor aef27f05 Trigger Time 2002-04-11 11:29:42 Email Address carosendahl@netscape.com URL visited Build ID 2002041011 Product ID MozillaTrunk Platform Operating System Win32 Module Trigger Reason Access violation User Comments Tried to access mail client Stack Trace PR_EnterMonitor [../../../../pr/src/threads/prmon.c, line 93] PL_ProcessPendingEvents [d:\builds\seamonkey\mozilla\xpcom\threads\plevent.c, line 505] USER32.DLL + 0x1b60 (0x77e11b60) 0x5e204689 From the stack, this crash points to bug 119693. Leaving this one verified fixed.

Nobody; OK to take it and work on it

Updated

•

14 years ago

Crash Signature: [@ _pl_GetEventCount]

You need to log in before you can comment on or make changes to this bug.