Closed Bug 99515 Opened 23 years ago Closed 23 years ago

Autoconfig use during migration leads to hang

Categories

(Core :: Preferences: Backend, defect)

x86
Windows 2000
defect
Not set
normal

Tracking

()

VERIFIED FIXED

People

(Reporter: lrg, Assigned: mitesh)

Details

(Whiteboard: pdt+)

Attachments

(3 files)

This is potentially related to http://bugzilla.mozilla.org/show_bug.cgi?id=99221

If Autoconfig is being used while migration is taking place, a hang occurs 
(though the migration process itself seems to be successful).  This forces the 
user to manually kill the netscape process.  This bug occurs 100 percent of the 
time in the case that autoconfig is being used.

Once the application is killed, it can be restarted, and the preferences that 
were migrated will be utilized, as well as those set by autoconfig (although i 
have observed some problems, as are noted in the previous bug).
QA Contact: sairuh → lrg
The event loop in AutoConfig is hanging after the pref-migration is done.
Looks like we have to push the current Event Queue into the current list of
eventQs so that the events posted by necko will be executed in the AutoConfig
event loop.  Patch is coming
Status: NEW → ASSIGNED
Looks like with the pref-migrator, the necko events are not posted to the queue 
we are getting from the GetThreadEventQueue so replacing it with 
PushThreadEventQueue. It creates a new queue and makes it a current queue. other 
code is to pop the event queue back in the error coditions. Also, moved the 
timer code after the eventloop code so it doesn't have to deal with event 
queues.


Adding Dan for event queue code review.
This patch's success implies the old queue (the one coming from 
GetThreadEventQueue) had already been suspended. This can happen fairly easily, 
so I guess on general principles it's probably always a good idea to push a new 
queue before falling into a captive event loop like that.

r=danm

PS you might want to consider adding a little helper class to take care of 
popping the event queue; you wouldn't have to be so careful to pop the queue at 
each possible exit point. You can crib the class at http://lxr.mozilla.org/
seamonkey/source/embedding/components/windowwatcher/src/nsWindowWatcher.cpp#282. 
For its use (just two lines) look for "queueGuard" elsewhere in that same file.
Comment on attachment 49361 [details] [diff] [review]
Replacing GetThreadEventQueue with PushThreadEventQueue

Putting an r= on the patch. PDT wants it r= there. see bug 97228
Attachment #49361 - Flags: review+
sr=alecf
Attachment #49361 - Flags: superreview+
Whiteboard: requesting PDT approval
Fix checked into the trunk. Waiting for PDT approval for branch checkin

Checking in nsAutoConfig.cpp;
/cvsroot/mozilla/modules/libpref/src/nsAutoConfig.cpp,v  <--  nsAutoConfig.cpp
new revision: 3.15; previous revision: 3.14
done
lets get it in there...marking pdt+
Whiteboard: requesting PDT approval → pdt+
Fix checked into the branch.

Checking in nsAutoConfig.cpp;
/cvsroot/mozilla/modules/libpref/src/nsAutoConfig.cpp,v  <--  nsAutoConfig.cpp
new revision: 3.12.2.2; previous revision: 3.12.2.1
done                        
Tinderbox seems fine
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Branch build on Linux seems to be hanging at the main event loop with these 
changes.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
The main event loop is hanging because even if it has stopped accepting events, 
GetThreadEventQueue return the disabled event queue when requested to return the 
youngest event queue. Now, the events posted to it get transfered to the elder 
event queue and the loop hangs waiting for the events to this queue. I think, 
GetThreadEventQueue should return the youngest active event queue. 
 
As discussed with Danm, It looks like the event queue is leaking and it should 
be deleted when it has stopped accepting events. At this point, changing 
anything in the event loop can cause regression. So I am changing AutoConfig 
event loop to use ProcessPendingEvents() instead waitforevent and handleevent. 
ProcessPendingEvents() processes the events in the elder event queues, too.

Now, another problem is that there two event queue loops, one for the AutoConfig 
and one for pref_get_ldap_attributes(). We need to create atleast one event 
queue for pref_get_ldap_attributes() because recursive calls to 
ProcessPendingEvents() on the same event queue doesn't work. 

There are two patches, one is with reference to branch version of 
nsAutoConfig.cpp and the another one is with reference to trunk version.

Mitesh, you mentioned in your comment above that you needed to push a new event 
queue in pref_get_ldap_attributes, but I don't see that in the patches. Besides 
that concern, this is the patch we've been discussing. Cool. I'm happy to 
actually understand what's happening here. Long live ProcessPendingEvents.

r=danm

again. But this time it'll all be good.
The second event queue and the event loop is only in the patch with reference to 
the branch sources. The code is not checked into the trunk. The reason you don't 
see the PushThreadEventQueue and PopThreadEventQueue in the patch is that it's 
already in the source base as part of bug 75955 check in. 

pref_get_ldap_atribute() may be called as part of one of the events processed 
during the first event loop -> ProcessPendingEvents(). Now, this also needs to 
be finished before we can proceed further so I need another event loop to make 
it look like Sync connection. As we can see from the ProcessPendingEvents() code 
it can not be called recursivley on the same event queue so I needed to create a 
new event queue. I know it's not the best solution as we had problems with the 
main event loop hanging on the Linux earlier but with ProcessPendignEvents() it 
seems working fine so I have kept it this way. If you have any other suggestion, 
please let me know.

Thanks for your help and the code review
That's just what event queue pushing is for, so it's possible to ProcessPending 
while in the middle of ProcessPending on some previous event. I imagine that 
pushed queue isn't leaking permanently or temporarily and causing problems like 
in the case you just fixed.
Yes, that's right the new event queue is not leaking. It's actually getting 
deleted when pref_get_ldap_attributes() finishes. Looks like LDAP connection is 
not adding extra references like the other necko connection (mainly, HTTP) or 
may be ProcessPendingEvents() does some magic and helps to remove any references 
to the event queue.
Comment on attachment 50643 [details] [diff] [review]
latest patch with reference to branch

sr=alecf
Attachment #50643 - Flags: superreview+
Attachment #50643 - Flags: review+
Comment on attachment 50644 [details] [diff] [review]
latest patch with reference to trunk

sr=alecf
Attachment #50644 - Flags: superreview+
Attachment #50644 - Flags: review+
Checked into the branch and the trunk
Marking it fixed
Status: REOPENED → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → FIXED
I have verified this fix in the branch now, in macos8 macosX linux and windows, 
marking as verified.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: