Open
Bug 331413
Opened 19 years ago
Updated 2 years ago
selfserv is not being tested for reference leaks on Windows
Categories
(NSS :: Test, defect, P2)
Tracking
(Not tracked)
NEW
People
(Reporter: julien.pierre, Unassigned)
References
(Depends on 2 open bugs)
Details
(Whiteboard: fix incomplete)
Attachments
(5 files, 16 obsolete files)
4.88 KB,
patch
|
nelson
:
review+
rrelyea
:
superreview+
|
Details | Diff | Splinter Review |
1.88 KB,
patch
|
Details | Diff | Splinter Review | |
1.42 KB,
patch
|
wtc
:
review+
|
Details | Diff | Splinter Review |
2.21 KB,
patch
|
wtc
:
review+
|
Details | Diff | Splinter Review |
20.89 KB,
patch
|
Details | Diff | Splinter Review |
Even though selfserv contains an NSS_Shutdown() statement, during ssl.sh/all.sh, this statement is never executed. This means if there is a slot/object reference leak, our QA will not detect it. This is a very serious hole.
Reporter | ||
Comment 1•19 years ago
|
||
FYI, selfserv has a "stop" command which can be sent by sending an HTTP "GET /stop" request . This is what ssl.sh script should do at first to stop selfserv. It should use the SIGTERM method only if this stop command fails.
We might even want to install a special signal handler on Unix for another signal (SIGUSR1 / SIGUSR2) to send that stop command instead of the HTTP channel. This is probably more reliable.
Reporter | ||
Comment 2•19 years ago
|
||
I have tested this patch and it allows ssl.sh to cleanly stop selfserv and run the NSS_Shutdown sequence . I created artificial leaks in selfserv and verified that this worked.
However, unfortunately, all.sh does not detect the failure, because it does not check the exit code from selfserv . I don't know how to do it with a shell script. Even if I knew how to do it, changes to the script structure are needed in order to return the exit code to the caller of kill_selfserv so that it can be tested and considered before reporting a failure or success. At this time, only the client exit code is considered.
Comment 3•19 years ago
|
||
Comment on attachment 215977 [details] [diff] [review]
register SIGUSR1 signal handler, and send it from ssl.sh
r+ =relyea.
This can also help certain profiling runs (though the 'stop' command is sufficient, the kill is probably easier).
bob
Attachment #215977 -
Flags: review+
Updated•19 years ago
|
Priority: -- → P2
Target Milestone: --- → 3.11.1
Updated•19 years ago
|
Assignee: jason.m.reid → nobody
QA Contact: jason.m.reid → test
Reporter | ||
Comment 4•19 years ago
|
||
Bob,
The HTTPS stop command depends on the server operating properly and being able to receive it, which the signal handler does not. Unfortunately, the signal handler does not work for Windows. There is no SIGUSR1 on Windows, so the handler won't compile. I will produce an updated patch which compiles it conditionally for XP_UNIX . I will also modify the script so USR1 is only sent on Unix .
Reporter | ||
Comment 5•19 years ago
|
||
I ran into an odd race where selfserv would sometimes fail to PR_Bind on Solaris even though the previous selfserv had shutdown cleanly, and the shell had waited for its PID to end. Nelson suggested turning on SO_LINGER for the server socket. This seems to have taken care of this race.
We have seen similar issues in the past on Linux, which is why selfserv -b was created as well as the loop in ssl.sh . Hopefully we won't have to do the same for any other platform. Even if we do, IMO, it is worth our trouble to detect server failures !
Reporter | ||
Comment 6•19 years ago
|
||
This patch only helps :
1) on Unix platforms
2) in debug builds
3) when used by a developer interactively who is looking for core files
It causes ssl.sh to make selfserv shut down fully on Unix .
In debug builds, with NSS_STRICT_SHUTDOWN set, selfserv will dump core in NSS_Shutdown() to indicate reference leaks. The core is dumped after the SSL client connections are complete, so the client-side tests don't fail.
Currently, ssl.sh/all.sh do not detect any core files, and therefore, this patch will not cause any change to the output of the nightly QA or tinderbox.
Despite the limitations of this patch, I would like to check it in ASAP, because I think the benefits are significant for all developers on the team, most of whom use some flavor of Unix for development.
Here are the things that remain to be done, in order of increasing complexity :
1) add core file detection to ssl.sh/all.sh
This will allow tinderbox and the nightly QA to detect the selfserv reference leaks for debug builds on Unix - as well as failures from any other programs we may not yet know about.
2) check the return status from selfserv in ssl.sh
This will additionally allow tinderbox and the nightly QA to detect the reference leaks for optimized builds on Unix (this also works for checking debug builds).
3) add a mechanism to stop the server on Windows
The SIGUSR1 method does not work on Windows. We can simulate a signal by creating a Windows message queue within selfserv, and listening for a particular message. We would then have to create a special kill program that would just send this message to selfserv. This should cover all the combinations of platforms that we officially support.
I would recommend that we don't use the HTTPS STOP method because it is more likely to fail when there are bugs in SSL, and will require manual attention from the nightly QA / tinderbox operator to stop selfserv .
Attachment #215977 -
Attachment is obsolete: true
Reporter | ||
Updated•19 years ago
|
Attachment #216602 -
Flags: superreview?(rrelyea)
Attachment #216602 -
Flags: review?(nelson)
Reporter | ||
Comment 7•19 years ago
|
||
Biswatosh, can you take a look at the remaining tasks that I outlined in comment 6 ? Please let me know if you are familiar enough with Windows to work on the last task. If not, I could handle it.
Comment 8•19 years ago
|
||
Comment on attachment 216602 [details] [diff] [review]
very partial fix for the problem
This is a step in the right direction, even if it's not a complete fix. r=nelson
Attachment #216602 -
Flags: review?(nelson) → review+
Reporter | ||
Comment 9•19 years ago
|
||
Nelson,
Thanks for the review. I have checked this in to the tip.
Checking in cmd/selfserv/selfserv.c;
/cvsroot/mozilla/security/nss/cmd/selfserv/selfserv.c,v <-- selfserv.c
new revision: 1.70; previous revision: 1.69
done
Checking in tests/ssl/ssl.sh;
/cvsroot/mozilla/security/nss/tests/ssl/ssl.sh,v <-- ssl.sh
new revision: 1.65; previous revision: 1.64
done
When I get Bob's review, I will check in to the 3.11 branch .
Comment 10•19 years ago
|
||
Comment on attachment 216602 [details] [diff] [review]
very partial fix for the problem
r+ relyea
Attachment #216602 -
Flags: superreview?(rrelyea) → superreview+
Reporter | ||
Comment 11•19 years ago
|
||
Thanks, Bob. I have checked this in to the NSS_3_11_BRANCH .
Checking in cmd/selfserv/selfserv.c;
/cvsroot/mozilla/security/nss/cmd/selfserv/selfserv.c,v <-- selfserv.c
new revision: 1.68.2.2; previous revision: 1.68.2.1
done
Checking in tests/ssl/ssl.sh;
/cvsroot/mozilla/security/nss/tests/ssl/ssl.sh,v <-- ssl.sh
new revision: 1.61.2.4; previous revision: 1.61.2.3
done
Comment 12•19 years ago
|
||
Comment on attachment 216602 [details] [diff] [review]
very partial fix for the problem
This patch has some problems.
1. As the signal() man page says in the Application Usage
section, new applications should use sigaction() rather
than signal().
http://www.opengroup.org/onlinepubs/009695399/functions/signal.html
There are also limitations on what functions can be called
from a signal handler. The sigaction() man page seems to
say only reentrant functions may be called from a signal
handler:
http://www.opengroup.org/onlinepubs/009695399/functions/sigaction.html
I remember that in a pthreads program only async-signal-safe
functions may be called from a signal handler.
Because of these issues, we should avoid using signals if
possible.
2. The VLOG statement should not be moved to the stop_server
function without change:
>+void stop_server()
>+{
>+ stopping = 1;
>+ VLOG(("selfserv: handle_connection: stop command"));
>+ PR_Interrupt(acceptorThread);
>+ PZ_TraceFlush();
>+}
You can either change the VLOG statement to say "stop_server"
instead of "handle_connection", or leave the VLOG statement
in handle_connection, and add a new VLOG statement to
sigusr1_handler that say "sigusr1_handler".
3. We used to kill the selfserv process because a buggy
selfserv process may not respond to the stop command. It
is possible that a buggy selfserv process may not be stopped
by the SIGUSR1 signal handler. We still need to kill a
selfserv process that cannot be stopped cleanly.
Comment 13•19 years ago
|
||
This patch ensures that selfserv closes all of its sockets
when it shuts down cleanly. I reviewed selfserv.c for potential
socket leaks and didn't find any. I found that it closes listen_sock,
model_sock, and tcp_sock's properly on a clean shutdown. This patch
merely adds an assertion (which asserts that all the tcp_sock's have
been closed), and a PR_Close call to handle the unlikely error case
that SSL_ImportFD fails.
I still don't understand why selfserv sometimes fails to bind the
listening socket, which already has the SO_REUSEADDR socket option set,
and why setting the SO_LINGER socket option fixes this problem.
Attachment #217929 -
Flags: review?(julien.pierre.bugs)
Reporter | ||
Comment 14•19 years ago
|
||
Wan-Teh,
Thanks for the patch. I found that selfserv is never invoked with the model socket option (-m) in ssl.sh . So, the socket leak, if any, could not have been in the block that you added a PR_Close to. I filed bug 333503 on the lack of testing of the model socket option in selfserv
Comment 15•19 years ago
|
||
In bug 332348 comment 24 , Wan-Teh called for the "linger" patch to be backed out.
I offer this patch as an alternative to backing it out.
Attachment #217969 -
Flags: review?(julien.pierre.bugs)
Reporter | ||
Comment 16•19 years ago
|
||
Comment on attachment 217929 [details] [diff] [review]
Ensure that there are no socket leaks
You could also reset tcp_sock to NULL after the PR_Close .
Attachment #217929 -
Flags: review?(julien.pierre.bugs) → review+
Comment 17•19 years ago
|
||
Comment on attachment 217969 [details] [diff] [review]
patch, disable "linger" workaround
Nelson, Julien,
I have never seen any sample code that sets the
SO_LINGER socket option on a listening socket in
order to avoid the bind() failure. Could you
explain why SO_LINGER fixes this problem?
The bind() failure on Linux that Julien mentioned
in comment 5 was different. That was caused by the
difficulty of synchronizing with the termination of
all the threads in the selfserv process we killed.
(Recall that the old LinuxThreads library implements
threads as processes.)
I don't understand why you want to keep this code.
A future NSS maintainer who only reads the setsockopt()
man page or Richard Stevens' Unix Network Programming
book won't understand what this code is for.
Comment 18•19 years ago
|
||
Julien, I found that selfserv.c:handle_fdx_connection and
strsclnt.c:do_connects use another method to ensure that
tcp_sock is closed on error paths, so I decided to have
selfserv.c:handle_connection to use the same method. I
also simplified the use of braces in
selfserv.c:handle_fdx_connection.
I got the same conclusion that selfserv does not leak
sockets on a cleanup shutdown. So this patch merely adds
an assertion to assert that fact, and adds a PR_Close call
to fix a socket leak on an unlikely error path, which is
not exercised by all.sh as you pointed out).
Attachment #217929 -
Attachment is obsolete: true
Attachment #218054 -
Flags: review?(julien.pierre.bugs)
Comment 19•19 years ago
|
||
Comment on attachment 218054 [details] [diff] [review]
Ensure that there are no socket leaks, v2
r=nelson
Attachment #218054 -
Flags: review+
Comment 20•19 years ago
|
||
In answer to the questions of comment 17:
The problem occurs when rapidly and repeatedly running selfserv,
then having it end, then immediately restarting it again.
Sometimes in that sequence, the newly restarted selfserv fails to
bind, with EADDRINUSE.
In BSD-derived sockets implementations, when a program closes a socket,
the socket may live on in the kernel for a period of time afterwords,
even after the program has completely ended.
This is due, in part, to the TCP TIME_WAIT state requirement.
This can lead to the EADDRINUSE problem, even though the process that
had previously used the address/port is gone, even for several seconds.
One potential solution is to have the program that previously used the
address/port stick around until the TIME_WAIT is over, so that the
in-kernel socket (BSD TCPCB) will be gone when the process ends, and
the succcessor process will not encounter EADDRINUSE. The use of the
SO_LINGER socket option is intended to accomplish that, and seems to do so.
Setting that option on the listen socket merely causes the accepted sockets
to inherit that option from the listen socket, IINM. Alternatively,
SO_LINGER could be set on every accepted socket.
However, we found that there is another side-effect of the SO_LINGER option,
one that is undesirable. (I don't recall the details just now. Julien,
can you remind us of what that was?)
There may yet be value to having an optional ability to set SO_LINGER,
which is why I advocated leaving the code in place.
Comment 21•19 years ago
|
||
Thanks for the code review, Nelson. I checked in the
patch (with a simpler comment) on the NSS trunk (3.12).
Attachment #218054 -
Attachment is obsolete: true
Attachment #218054 -
Flags: review?(julien.pierre.bugs)
Reporter | ||
Comment 22•19 years ago
|
||
Comment on attachment 217969 [details] [diff] [review]
patch, disable "linger" workaround
Nelson,
In response to comment 20 , the bug that the linger option introduced is bug 332348 .
I don't see the point of leaving the code in but having the option turned off, so r- . If you want the option to be optional, I would be fine with that, but the patch needs to be completed to take something from the command-line.
For now, we could just surround the code that sets the option with an #ifndef WIN95 . But it would be preferrable to get to the bottom of the problem instead.
Attachment #217969 -
Flags: review?(julien.pierre.bugs) → review-
Reporter | ||
Comment 23•19 years ago
|
||
Wan-Teh,
In response to comment 17, Nelson has explained the reason why I put it in. It doesn't make a lot of sense to me why this works. The race in itself is not always reproducible, so it's not easy to verify the fix.
Several observations here :
1) Windows is not doing a clean shutdown yet, only Unix is. That's because there is nothing to trigger the clean shutdown on Windows at this point - no SIGUSR1 equivalent in the script, and no signal handler.
The selfserv process is still being killed as it always was.
SO_LINGER was put in to solve issues with the clean shutdown on Unix. So, it wouldn't really be wrong to surround the linger setting code with #ifndef WIN95 or even #ifndef WIN32 .
2) Even though setting the linger option may not be necessary on Windows at this time, it should also never be harmful. The worst that should happen is that some sockets take 1 second more to close in selfserv. But that's not what we see - we see a deadlock.
Comment 24•19 years ago
|
||
Julien, please back out your change.
Reporter | ||
Comment 25•19 years ago
|
||
Attachment #218100 -
Flags: review?(nelson)
Comment 26•19 years ago
|
||
Comment on attachment 218100 [details] [diff] [review]
Disable SO_LINGER on the only platform where it causes problems, on which it is also not needed since full selfserv shutdown is not implemented yet
This code needs a comment. The purpose of setting
the SO_LINGER socket option and why it's commented
out for the WIN95 configuration are not obvious.
The SO_REUSEADDR socket option is designed to allow
a process to bind to a port that's in the TIME_WAIT
state. Do you know why SO_REUSEADDR doesn't work?
That's the root cause of the problem.
Reporter | ||
Comment 27•19 years ago
|
||
Wan-Teh,
No, I don't know why setting SO_LINGER fixes the problem . The problem is very elusive. I tried to reproduce it today for a few hours but could not.
Attachment #218100 -
Attachment is obsolete: true
Attachment #218109 -
Flags: review?(wtchang)
Attachment #218100 -
Flags: review?(nelson)
Comment 28•19 years ago
|
||
Comment on attachment 218109 [details] [diff] [review]
add a comment and fix a typo
r=wtc. I'd prefer removing the code, but
I don't want to waste you and Nelson too
much time.
Two suggested changes.
1. Use one of our two preferred multi-line
comment styles:
/*
* line 1
* line 2
*/
or
/*
** line 1
** line 2
*/
2. Replace "on WIN95" with "in the WIN95
build configuration".
Attachment #218109 -
Flags: review?(wtchang) → review+
Reporter | ||
Updated•19 years ago
|
Assignee: nobody → julien.pierre.bugs
Comment 29•19 years ago
|
||
(In reply to comment #26)
> The SO_REUSEADDR socket option is designed to allow
> a process to bind to a port that's in the TIME_WAIT
> state.
Yes, and there are several ways to use it, including:
a) set the REUSEADDR option on the socket while it's bound and in use
(the socket that may potentially go into TIME_WAIT state), saying in effect
"I don't want to exclusively tie up this port", or
b) set the REUSEADDR option on the new socket that you're about to try to
bind to the address/port that's still in use, saying "I want to use this
port, even if it is already/still in use by another process, or
c) both of the above.
SO_LINGER clearly reduces the window of vulnerability to this port collision,
by delaying the start of the second process that will create the new socket
that will contend for the same port.
Have we played with SO_REUSEADDR in any of the 3 ways described above, to
see how they impact this problem?
Reporter | ||
Comment 30•19 years ago
|
||
Nelson, in response to comment 29, we are using option b) - setting PR_SockOpt_Reuseaddr linger on the listen socket before binding. Any connection socket accepted from it should inherit that option.
Reporter | ||
Comment 31•19 years ago
|
||
Bug 332348 has been identified as the cause of the problems with linger on WIN95, so I'm marking that bug as a dependency. I would rather not checkin patch 218109, which disables linger on WIN95 in selfserv. Instead, we can do the WIN95 QA for NSS with a version of NSPR that includes the fix.
Depends on: 332348
Comment 32•19 years ago
|
||
I agree with comment 31. Thanks for carrying this issue this far!
Reporter | ||
Comment 33•19 years ago
|
||
Wan-Teh,
Re: comment 12,
1) We have been running with the signal handler for several weeks . There is no evidence of any problem related to functions not being re-entrant. That's not proof there isn't a problem, but it is re-assuring. We could change the logic to have selfserv start a thread on startup that would wait for a semaphore before doing the stop_selfserv(), and have the signal handler just post it.
2) I agree. I will provide a patch later to fix the logging.
3) You are right. Christophe made the same observation. Fortunately, we haven't had any instance of the signal handler failing to do its job so far.
But we should fix the script to fall back to unqualified kill (which implies SIGTERM ? or is it SIGKILL).
Comment 34•19 years ago
|
||
Comment on attachment 218109 [details] [diff] [review]
add a comment and fix a typo
Julien,
Please check in this patch (commenting out the setting
of the SO_LINGER socket option in the WIN95 build
configuration), or a similar patch that only sets the
SO_LINGER socket option on the platforms that need it.
The root problem is that the SO_REUSERADDR socket option
isn't working on some platforms (this bug report only
mentions Solaris in comment 5). I hope that you agree
that that's a bug and the setting of the SO_LINGER socket
option is a workaround. If so, then this workaround
should only be used on the platforms that need it,
especially now that we know it manifested an NSPR bug.
I don't have time now to write a proper fix for that
NSPR bug now. I am grateful that you and Nelson verified
my proposed approach worked, but it takes a lot more time
to turn a proof-of-concept patch into a proper fix.
Reporter | ||
Comment 35•19 years ago
|
||
Wan-Teh,
I checked in the modified patch 218109 to both the trunk and the branch.
Checking in selfserv.c;
/cvsroot/mozilla/security/nss/cmd/selfserv/selfserv.c,v <-- selfserv.c
new revision: 1.72; previous revision: 1.71
done
Checking in selfserv.c;
/cvsroot/mozilla/security/nss/cmd/selfserv/selfserv.c,v <-- selfserv.c
new revision: 1.68.2.3; previous revision: 1.68.2.2
done
Reporter | ||
Comment 36•19 years ago
|
||
Wan-Teh,
This fixes some of the issues you reported in comment 12 . Given that things work reliably right now, I didn't feel it was necessary to change the content of the code in the signal handler.
If we start getting failures, we can create an extra thread which will call stop_server(), and have sigusr1_handler post a condition variable to notify it. Hopefully that should always be recognized as a safe thing to do within the signal handler function.
I also did not yet implement the script change to fall back to SIGTERM if SIGUSR1 fails. I'm tempted to use the method we have for the Linux bug, which is to try selfserv -b, and if it fails after say, 5 seconds, use SIGTERM. Do you think that approach is OK ? I would rather have a wait command with a timeout option, but I don't think there is one, at least not built-in to the shell.
Attachment #218916 -
Flags: review?(wtchang)
Comment 37•19 years ago
|
||
Comment on attachment 218916 [details] [diff] [review]
replace signal with sigaction; fix logging
r=wtc. Eventually we should use a thread to
handle the signal as you described in comment
33. That thread could also wait for the
SIGUSR1 signal in a sigwait call. The sigwait
method requires more setup (all other threads
in the process need to block the SIGUSR1 signal).
(See the Rationale section in
http://www.opengroup.org/onlinepubs/009695399/functions/sigwait.html.)
Attachment #218916 -
Flags: review?(wtchang) → review+
Comment 38•19 years ago
|
||
Julien, I just noticed that your comment 36.
In the signal handler, you can post a semaphore, but you
can't signal a condition variable. (None of the pthread
functions can be called from a signal handler.) So what
you originally described in comment 33 is better.
The shell has no built-in way to wait for a process with
a timeout. A common solution is to fork a child process
that sleeps some time (the timeout) and kills the selfserv
process, and to have the original (parent) process kill
the child (killer) process after the wait command returns.
I seem to recall this method doesn't work well in older
versions of MKS Korn Shell or Cygwin bash on Windows.
This method is used in
http://lxr.mozilla.org/nspr/source/nsprpub/pr/tests/runtests.sh.
The selfserv -b method is really a hack, but I guess we can
use it if the fork-sleep-kill method doesn't work.
Reporter | ||
Comment 39•19 years ago
|
||
Thanks for the review, Wan-Teh. I checked patch 218916 to the tip .
Checking in selfserv.c;
/cvsroot/mozilla/security/nss/cmd/selfserv/selfserv.c,v <-- selfserv.c
new revision: 1.73; previous revision: 1.72
done
On the branch I'm checking in both this patch and patch 218080 (ensure that there are no socket leaks), which was only checked in to the tip before.
Checking in selfserv.c;
/cvsroot/mozilla/security/nss/cmd/selfserv/selfserv.c,v <-- selfserv.c
new revision: 1.68.2.4; previous revision: 1.68.2.3
done
Reporter | ||
Comment 40•19 years ago
|
||
Wan-Teh,
Re: comment 38, :
I'm wondering how the selfserv -b trick actually works on Linux given that we set SO_REUSEADDR on the listen socket . The bind should never fail with this option, right (yet I know we have seen it fail in the past on Linux, and I've seen it fail on Solaris recently too) ? So this can't really be a reliable cross-platform indicator that the server is gone. Maybe if we tried to bind without SO_REUSEADDR and waited for it to succeed, it would be a more reliable test.
My vote would be for adding a "wait <pid> <timeout>" tool in our NSS source tree. Hopefully it's not too complicated to implement with NSPR's PR_WaitProcess .
Are you sure the semaphore APIs (sem_post, right ?) are safe to use in signal handlers ? I have seen references that pthreads APIs are unsafe, and SCO considers the sem_ functions part of pthreads, though they don't have the pthread_ prefix. The man page for sem_post on Solaris says that
The sem_post() function is reentrant with respect to signals
and may be invoked from a signal-catching function.
But I wonder if this is true of all platforms.
I would certainly like to minimize the platform differences from the point of view of our test scripts. Maybe we can also have a "stop_selfserv" program in our source tree, which will do the right thing on any platform - eg. send a window message on Windows, or send a USR1 signal on Unix. And if that doesn't stop selfserv correctly, the program could fall back to doing the kill itself, rather than having that logic in the script. I see value in that especially as we will run into issues with selfserv running multi-process. It makes sense to take advantage of process groups on Unix for that case, which aren't available on Windows. It's probably best to have that complex platform-specific logic in a C program rather than a script.
Reporter | ||
Comment 41•19 years ago
|
||
Work was start on this bug in 3.11.1, but not completed. Changing target to 3.11.2.
Target Milestone: 3.11.1 → 3.11.2
Reporter | ||
Comment 42•18 years ago
|
||
Retargetting all P2s to 3.11.3 .
Target Milestone: 3.11.2 → 3.11.3
Comment 43•18 years ago
|
||
(In reply to comment #40)
On Unix side, I wrote a small C program which sends USR1 signal to all running selfservs(one or many) and sleeps for some defined time and then sends KILL signal to all of them. I tested this by using it with ssl.sh and seemed to be working. selfserv.c will dump its pid to a file ,say PIDBase.
The core logic(not the full pgm with all validations)to kill all selfserv/s is this:
***************************************************************
fp = fopen("PIDBase","r");
while (fscanf(fp,"%d",ptr) >= 0)
{
i = kill(ptr,SIGUSR1);
}
fseek(fp,SEEK_SET,0);
sleep(SLEEPTIME);
while (fscanf(fp,"%d",ptr) >= 0)
{
i = kill(ptr,SIGKILL);
}
****************************************************************
Now,this program stopserver.c(a proposed new entry in cmd folder),will give us
an executable stopserver. We shall call this in ssl.sh,as soon we are done with
testing with selfserv. It will thus replace kill_selfserv in all places in ssl.sh.
Once the Windows side is done,the same stopserver.c will handle both of this
windows and unix side.
Another point which might come up is ,all the children selfservs are having the first selfserv as their parent. Should we do a wait for these children? I read that even if for some temporary time,some of these children become zoombies, the init in Unix will become their parent and execute a wait call and remove the process id from the zoombie table. In that case, should we bother about it?
Comment 44•18 years ago
|
||
Julien,
For simulating signals in windows you had once mentioned about LSF.
(http://www.ms.washington.edu/Docs/LSF/LSF_4.2_Manual/programmer_4.2/4-advanced7.html )
Did you refer to that just to site an example or you have exactly LSF in mind to tackle this problem on signals in windows?
Because,if we actually plan to have LSF installed then this shall become a requirement for running ssl.sh with multiple selfservs on Windows.
Comment 45•18 years ago
|
||
Another thing. We plan to know the exit status of selfserv. If selfserv is started in multiprocess mode,do we need exit status of all children?
One immediate way is to dump the status along with the pid to a file but what way/how will it be used? If we run all.sh and suppose one selfserv fails, do we stop any further execution?
If we are interested in knowing the exit status of the parent selfserv only,it could also be done by creating an env var and putting the exit status to that(inside ssl.sh) and examining that variable in all.sh. So,let us decide first what exaclty we want with the status.
Comment 46•18 years ago
|
||
(Regarding comment #44)
MKS and Cygwin seem to understand kill function in a C program.
1)http://www.mkssoftware.com/docs/man3/kill.3.asp
2)http://www.cygwin.com/cygwin-ug-net/
And,LSF says that the sending of signal through bkill can be
done by LSF by actually sending a job control notification and
it has to be caught by a LSF aware Windows application.
But then, it means we have to have LSF with us.
Comment 47•18 years ago
|
||
Bishwatosh, although all of our NSS QA command line utilities run on the
Windows OS, they are not "windows applications". The latter interact with
the Windows windowing system by receiving messages from the windowing system
in a loop, and responding to them. None of the NSS tools work that way.
So, even LSF does not help.
Yes, most unix shell command packages for windows implement 'kill'.
However, unlike Unix kill, the emulator packages only implement a very
limited subset of Unix' signals. None of them emulates "user" signals,
e.g. SIGUSR1, afaik.
Comment 48•18 years ago
|
||
(In reply to comment #47)
Nelson,
It looks both MKS and Cygwin now support used defined signals,like SIGUSR1 and SIGUSR2.
Pls see:
1) http://www.mkssoftware.com/docs/man3/api_intro.3.asp#Supported_Signals
and
2)http://www.cygwin.com/cygwin-ug-net/
In (2),you can find the table of signals supported in cygwin-ug-net-nochunks.html#id4718430
Reporter | ||
Comment 49•18 years ago
|
||
Biswatosh,
Re: comment #43, we should try to wait for all the child processes. Otherwise, the port on which selfserv runs may be held in use, and the next selfserv might fail.
Zombie processes are likely to cause problems.
Re: comment #44, I only pointed you to LSF as an example. You shouldn't link with any toolkit-specific libraries. I think the article provides enough information to write your own signal emulation routines using Windows messages.
Re: comment #45, we care about all selfserv processes, not just the parent one. If any one process has an error code, then it should be reported as a failure.
Nelson,
Re: comment #47, our tools indeed do not process windowing messages on Windows today. I suggested that be changed specifically for selfserv for the purpose of being able to receive the equivalent of a "SIGUSR1" signal. selfserv will just need to create an additional thread that will only wait for windowing messages. When it receives the message, it can take the same action that the SIGUSR1 signal handler already takes on Unix.
Comment 50•18 years ago
|
||
(In reply to comment #49)
Julien,
I tried with Windows Messages and we can pass messages from one application to another in the following way(This is the crux of the code,not with all validations and all -intent is to give the idea).
Say, multiple selfservs are running and there is a separate exe called stopselfserv.exe.
Now,in stopselfserv.exe, I just include one line :
********************************************
PostMessage(HWND_BROADCAST,WM_USER,0,0);//under ifdef WINDOWS,of course
********************************************
In place of WM_USER,we should be able to use our own message provided we
register it.
And,in selfserv.c, we add the lines:
************************************************************
WNDCLASS wc;
HWND hwnd;
wc.lpfnWndProc = MainProc;//we shall define MainProc somewhere in selfserv.c, but it won't have any use. This is being done to register the class successfully.
wc.lpszClassName = "Main";
RegisterClass(&wc);
hwnd = CreateWindow("Main","MainWindow",WS_OVERLAPPEDWINDOW,0,0,170,55,0,0,hInst,0) ;
while(1)
{
i = PeekMessage( &Msg,hwnd, 0, 0,PM_NOREMOVE );
if(i == 1){
stop_server();
}
************************************************************
Now,Windows SDK probably suggests that this code should be within
WinMain. But,both MKS and Visual studio are successfully able to
compile and execute this,if written within "main". I already had tested this using sample codes.
Now,I am trying to figure out as where should I place PeekMessage()
in selfserv.c.
You were suggesting to call a separate thread from selfserv.exe ,which shall
Peek for Messages. But,even if it 'Peeks' and finds a message,it has to
inform the main thread. So,the main thread again must be waiting for that.
If that is so,why can't we wait for the message from stopselfserv,directly in the main thread?
Another thing, I didn't like the idea of actually having to call CreateWindows.
But,I tried without it but it didnt work. In the same application, Peek and Post can work together but when different applications are there, it did not work in my case.
Reporter | ||
Comment 51•18 years ago
|
||
Biswatosh,
The main thread is already busy in its accept loop, in the function called "do_accepts" . It can't be both waiting in accept and waiting for a window message. This is why there needs to be a second thread on Windows to wait for window messages. The notification of the main thread can be handled the same way as on Unix, using the PR_Interrupt() call which is already in stop_server() . All you should need to do is call stop_server() when you receive the appropriate message.
I don't think you will find a way to do this without creating a window. It's perfectly fine to do so as long as you put it in #ifdef WIN32 . Try to do this in isolated functions rather than pepper the #ifdef within the common code.
Another note I would like to add is that you should make sure you are only sending the message to the intended processes.
There could be multiple simultaneous but unrelated instances of selfserv running on the machine, for example on different ports. This happens in our nightly QA sometimes when the tip, pkix and jes QA runs run at the same time. So, you need to make sure you don't shut down the wrong process. If you embed the port value as part of the message, that might do the trick.
Reporter | ||
Updated•18 years ago
|
Assignee: julien.pierre.bugs → biswatosh.chakraborty
Comment 52•18 years ago
|
||
(In reply to comment #51)
Julien,
After some expts.,it now looks that on windows, a thread many times is unable to interrupt the main thread by calling PR_Interrupt(),when the main thread
is executing a blocking call,like PR_Accept(). The same problem does not recur in Solaris. NSPR page says that the Interrupt might take 5 secs to be successful but I wait for several mins. yet the main thread is not interrupted. However,if I relace PR_Accept() by any other nonblocking call or if I set the socket option as nonblocking, the thread is able to interrupt. Interestingly,in all cases,PR_Interrupt() returns success although the blocking call remains in that situation unless a connection is actually made and then the moment it goes to the next instructions, the thread is able to interrupt the main thread.
The situation becomes more prominent when a single process P creates multiple processes P1,P2,P3,.. and each Pi creates a thread 't' which calls PR_Interrupt() by passing the main thread of Pi. And,of course each Pi calls PR_Accept(). Here, on windows, all Pi's remain blocked at the position of PR_Accept() despite all PR_Interrupt() returning success. This situation is never there on Solaris(I presume ,on all unix flavours) but is always there in windows(I am using win XP).
For,demonstrating this, I might suggest playing with a small code ,which just creates a thread and then calls PR_Accept(). Let the thread call PR_Interrupt(),by passing the main thread to it. This succeeds on Solaris but fails many times on windows. The sample code is not with all validations and have lots of printf statements for debugging.
If this problem would not have been there, the windows issue could have been closed. Because, the logic of calling Postmessage() to all top level windows with a particular registered msg, and continously calling PeekMessage() from a thread, works. It is able to catch the message and eventualy is able to
call stop_serrver(). But, although PR_Interrupt() of stop_server() returns SUCCESS, it is not able to actually interrupt PR_Accept() of the main thread.
The code can be the following:
*********************************************************************************
#include <stdio.h>
#include <string.h>
#include "secutil.h"
#if defined(XP_UNIX)
#include <unistd.h>
#endif
#if defined(_WINDOWS)
#include <process.h> /* for getpid() */
#endif
#include <signal.h>
#include <stdlib.h>
#include <errno.h>
#include <fcntl.h>
#include <stdarg.h>
#include "nspr.h"
#include "prio.h"
#include "prerror.h"
#include "prnetdb.h"
#include "prclist.h"
#include "plgetopt.h"
#include "pk11func.h"
#include "secitem.h"
#include "nss.h"
#include "ssl.h"
#include "sslproto.h"
#include "cert.h"
#include "certt.h"
static int maxThreads = 8;
/* data and structures for shutdown */
static int stopping;
static PRThread * acceptorThread;
void stop_server()
{
int i = -1;
int ret = -1;
printf("\nstop_server called. pid is %d\n",getpid());
stopping = 1;
ret = PR_Interrupt(acceptorThread);
if(ret == PR_SUCCESS)
printf("\nInterrupt was a SUCCESS,pid %d\n",getpid());
else
printf("\nInterrupt was a FAILURE,pid %d\n",getpid());
PZ_TraceFlush();
printf("\nstop_server exiting. pid is %d\n",getpid());
}
int WaitforMsg()
{
printf("\nThread called...............pid is %d\n\n\n\n\n",getpid());
stop_server();
}
int
main(int argc, char **argv)
{
PRFileDesc * listen_sock;
PRStatus prStatus;
PRNetAddr addr;
PRSocketOptionData opt;
unsigned short port = 0;
int listenQueueDepth = 5 + (2 * maxThreads);
PRErrorCode perr;
PRProcess * newProcess[10];
PRProcessAttr * attr;
int i;
PRInt32 exitCode;
PRStatus rv;
int ctr = 0;
addr.inet.family = PR_AF_INET;
addr.inet.ip = PR_INADDR_ANY;
addr.inet.port = PR_htons(port);
listen_sock = PR_NewTCPSocket();
if (listen_sock == NULL) {
printf("\nPR_NewTCPSocket error\n");
}
opt.option = PR_SockOpt_Nonblocking;
opt.value.non_blocking = PR_FALSE;
prStatus = PR_SetSocketOption(listen_sock, &opt);
if (prStatus < 0) {
printf("\nPR_SetSocketOptionPR_SockOpt_NonblockingError\n");
}
opt.option=PR_SockOpt_Reuseaddr;
opt.value.reuse_addr = PR_TRUE;
prStatus = PR_SetSocketOption(listen_sock, &opt);
if (prStatus < 0) {
printf("\nPR_SetSocketOptionPR_SockOpt_Reuseaddr\n");
}
prStatus = PR_Bind(listen_sock, &addr);
if (prStatus < 0) {
printf("\nPR_BindError\n");
}
prStatus = PR_Listen(listen_sock, listenQueueDepth);
if (prStatus < 0) {
printf("\nPR_ListenError\n");
}
attr = PR_NewProcessAttr();
if (!attr)
printf("\nPR_NewProcessAttrError\n");
rv = PR_SetFDInheritable(listen_sock, PR_TRUE);
if (rv != PR_SUCCESS)
printf("\nPR_SetFDInheritableError\n");
printf("\nselfserv: do_accepts: starting\n");
PR_SetThreadPriority( PR_GetCurrentThread(), PR_PRIORITY_HIGH);
acceptorThread = PR_GetCurrentThread();
PR_CreateThread(PR_USER_THREAD,WaitforMsg,NULL,PR_PRIORITY_NORMAL,PR_LOCAL_THREAD,PR_UNJOINABLE_THREAD,0);
while (!stopping) {
PRFileDesc *tcp_sock;
PRCList *myLink;
printf("\n\n\nselfserv: About to call accept. pid is %d, stopping is %d\n",getpid(),stopping);
tcp_sock = PR_Accept(listen_sock, &addr, PR_INTERVAL_NO_TIMEOUT);
printf("\n\n\nselfserv: called accept. pid is %d, stopping is %d\n",getpid(),stopping);
}
printf("selfserv: Closing listen socket.\n");
if (listen_sock) {
PR_Close(listen_sock);
}
return SECSuccess;
}
********************************************************************************
Comment 53•18 years ago
|
||
Biswatosh, The essence of your comment 52 is to report a bug (or at the very
least, a platform inconsistency) in NSPR on Windows. It may be summarized
as "Threads blocked in PR_Accept are not interrupted by PR_Interrupt".
Please file an NSPR bug about this. Please attach the sample source code
as an attachment to that bug, not as a comment in line.
If the solution to that problem is a prerequisite to solving this bug,
please mark that bug as blocking this one, by entering this bug number into
the NSPR bug's list of bugs depending on it (being blocked by it).
Thanks.
Reporter | ||
Comment 54•18 years ago
|
||
Biswatosh,
I think your code has a problem when creating the message loop thread.
You are using PR_LOCAL_THREAD . Please don't . This may cause NSPR to create an NT fiber, which could run in the same native thread as other threads in the process. Thus it would be logical that one thread would be unable to interrupt the other. Try PR_GLOBAL_THREAD instead . I suspect this will solve your problem.
If not, try setting the environment variable NSPR_USE_NATIVE_THREADS_ONLY to 1 and then run your program again.
Please report on whether either of these suggestions solves your problem, and if so, which one.
Reporter | ||
Comment 55•18 years ago
|
||
There are several other problems with your code :
1) your WaitForMsg function has the wrong signature :
int WaitforMsg()
It has to be :
void WaitForMsg(void * arg)
Even if you aren't using the argument. The compiler should give you a warning on the PR_CreateThread due to that function signature problem. Don't ignore it. This may cause your function to never get executed.
2) There is a race in your code. Your WaitForMsg function may be executed before the main thread is in PR_Accept . Ie. PR_Interrupt could run before the main thread is even in PR_Accept . But since WaitforMsg sets the global variable "stopping" to 1 , depending on the scheduling order, your main thread may never even get into PR_Accept at all . This is probably what happens on Solaris and why your program exits there. But the scheduling on Windows could be different. You can add debug statements to confirm why the main thread actually exits - whether because PR_Accept got interrupted or because stopping was PR_TRUE .
Your WaitForMsg function needs to actually wait for a Windows message first, so that it will call PR_Interrupt while the main thread is actually in PR_Accept .
Comment 56•18 years ago
|
||
You should be able to interrupt a PR_Accept call, whether it is
a PR_LOCAL_THREAD or PR_GLOBAL_THREAD that calls PR_Accept. In
your NSPR bug report you should specify whether the build configuration
is "WINNT" or "WIN95".
PR_Interrupt does not wait until the target thread is interrupted.
The PR_SUCCESS value returned by PR_Interrupt means an interrupt
request has been sent to the target thread successfully.
Comment 57•18 years ago
|
||
(In reply to comment #54 and #55)
Julien,
1)
With the sample code that I had given in comment #52, I verified that PR_Accept() is indeed being called and it stops there despite getting interrupted by another thread. It is not that the variable "stopping" is becoming 1 before calling PR_Accept() on solaris. I verified this by including printf statements just before and after PR_Accept() call.
2)
I simplified the code and removed the stopping variable and now there is no while loop. Just a single call to PR_accept() in main(). Even then, it is hanging there on Windows and not hanging on solaris.
3)
Plus, I tried with PR_GLOBAL_THREAD_OPTION in PR_CreateThread() as well with NSPR_USE_NATIVE_THREADS_ONLY=1 as an env. variable (either and both) and with both the original sample code and the revised one. The results are same in that it hangs in windows in PR_Accept() call and do not hang on solaris. Although, PR_Interrupt() returns SUCCESS in all cases.
4)
I had also changed the signature of WaitforMsg() to WaitforMsg(void*) and still it did not work. In any case,even with the previous signature, I verified that the function Waitformsg() was getting called. I know it because I had given printf statements within it and it came on the console.
5)Now the question. Do I therefore raise this as a bug in NSPR? IMO, I should.
Comment 58•18 years ago
|
||
Please read comment 53.
Reporter | ||
Comment 59•18 years ago
|
||
Biswatosh,
I have confirmed your NSPR bug report. I did this by adding a PR_Sleep(PR_SecondsToInterval(5));
statement before your PR_Interrupt in the test program. This causes PR_Interrupt to be delayed by 5 seconds, so I'm reasonably confident that the main thread is already in PR_Accept .
I observed that PR_Interrupt does not interrupt PR_Accept on Windows . But this only happened in the Windows NT target. If you target Win95, it works. Please set your environment variable OS_TARGET to WIN95 . Then rebuild your NSPR and NSS . This should work around this problem for now. I will file a separate bug on NSPR for this issue.
Comment 60•18 years ago
|
||
(In reply to comment #59)
Thanks for confirming that Julien.
Here are certain observations.
1)I made OS_TARGET=WIN95 and rebuild nspr and nss and found that PR_Accept()
was getting interrupted by another thread calling PR_Interrupt().
2)Nelson suggested another idea. Create another thread and from there connect to the server. Make stopping=1. Naturally, PR_Accept() would be executed and it will go to the next instruction. Here ,we won't require a PR_Interrupt().
I tried that and found it working for multiple selfservs as well.
Here is a sample code
**************************************************************
void stop_server()
{
PRHostEnt hostentry;
char buf[PR_NETDB_BUF_SIZE];
PRNetAddr addr;
PRFileDesc *socket = NULL;
int port = 8443;
int exitStatus = 0;
PR_GetHostByName("localhost", buf, sizeof(buf), &hostentry)
addr.inet.family = PR_AF_INET;
addr.inet.port = PR_htons((short)port);
addr.inet.ip = *((PRUint32 *) hostentry.h_addr_list[0]);
socket = PR_NewTCPSocket();
stopping = 1;
PR_Connect(socket, &addr, PR_INTERVAL_NO_TIMEOUT);
PR_Shutdown(socket,PR_SHUTDOWN_BOTH);
PR_Close(socket);
PZ_TraceFlush();
}
**************************************************************
3)Now, the remaining problem is how to wait for some definite time and then kill all the processes forcefully.
The ways could be the following:
a)In the shell script itself, sleep for sometime and kill using -9 all pids. We can collect the pids from the PIDFile, to which would be dumped all PIDs of the processes as soon as they start. This dumping of pids would be done in the main() of selfserv.c itself.
b)Write a C program which would do the same and call it from the shell script.
But, this program can't call PR_KillProcess() as it would require to know PRProcess. All it can know are pids of the selfservs from the file. Similarly, it cant call
TerminateProcess() as it wont know the handles of those processes. And, since this is a different process, ExitProcess()wont do either.
c)Within selfserv.c, create a thread which will sleep for sometime and then get the handle of the current process and pass the handle to TerminateProcess()
by doing this
*****************************************************************
PR_Sleep(PR_SecondsToInterval(200));
TerminateProcess(GetCurrentProcess(),0);
*****************************************************************
or by calling ExitProcess() after PR_Sleep().
I tried this and it does get partially successful but just just hangs in PR_Clean() call in main() of selfserv.c. PR_Clean() is almost at the end of main(). And it looks, PR_Clean() waits for the thread (which calls TerminateProcess()) to end and thus automatically waits for the same period of time as given in PR_Sleep().
Now, comparing (a),(b) and (c), it appears that (a) is the sole option.
So, the questions are
1)Whether to go the PR_Interrupt() way and test,close the code by keeping OS_TARGET=WIN95 or to use the idea of connecting to the server from a thread?
2)For killing all the processes of selfserv by force, whether there is any other method? If no, then should we take the approach of (a)?
Reporter | ||
Comment 61•18 years ago
|
||
Biswatosh,
2) In response to comment 60, doing a PR_Connect to shutdown the accept thread may be acceptable, but only if you are sure that the server is actually accepting connections. If it is not, the PR_Connect itself may be block. So you may be creating another problem while attempting to solve one. Since PR_Connect can be successfully interrupted unlike PR_Accept, you might use another thread to PR_Interrupt it. But this gets really messy ! You are welcome to do that for now since there is no other solution at the moment.
But later, we should remove that logic and just use PR_Interrupt, once bug 347963 is fixed.
3)
b) The limitation of the NSPR API and PRProcess structures is very unfortunate. Thanks for pointing it out. I filed bug 347963 to request a function that returns a PRProcess from a PID . I believe it not a difficult feature to add, but it needs to be done on all platforms, so it may take some time before it is implemented on all of them .
In the meantime, you can actually call TerminateProcess knowing just the PID of the target. To get the handle for the target process from the PID on Windows, call OpenProcess(PROCESS_ALL_ACCESS, PR_FALSE, PID) . See the MS SDK documentation for more details. I would use this to implement the fix for 347963 . You can try this directly in native code and let me know if it works for you.
Comment 62•18 years ago
|
||
(In reply to comment #61)
Julien,
I tried your suggested way to get process handle from OpenProcess() and then called TerminateProcess() and it worked.
On Windows, as OS_TARGET=WIN95, I now have tried with all the modifications and new functions in place(with PR_Interrupt), and it seems to be working as desired.
I created single as well as multiple servers and all of them were closed gracefully as well as by force.
Here are the new functions(If these are OK then I will put all error conditions and validations and submit for final review, these are written here just to let you know the main logic, so it is raw.).
*******************************************************************
selfserv.c--changes
********************************************************************
#if defined(_WINDOWS)
#include <process.h> /* for getpid() */
#include <windows.h>
LRESULT APIENTRY MainProc(HWND,UINT,WPARAM,LPARAM);
HINSTANCE g_hInst;
int sslport = 1;
#endif
void stop_server()
{
stopping = 1;
PR_Interrupt(acceptorThread);
PZ_TraceFlush();
}
int WaitforMsg()//this is the thread which will wait for window msg
{
int i = -1;
HWND hwnd = NULL;
MSG Msg;
UINT wMsgFilterMin = 0;
UINT wMsgFilterMax= 0;
UINT MsgNo = 0x8034;
WPARAM wParam;
LPARAM lParam;
int regmsg = -1;
int msgrcvd = 0;
WNDCLASS wc;
HINSTANCE hInst;
PRIntervalTime nrval = PR_SecondsToInterval(1);
wc.cbClsExtra = 0;
wc.cbWndExtra = 0;
wc.hbrBackground = (HBRUSH) GetStockObject(LTGRAY_BRUSH);
wc.hInstance = hInst;
wc.hCursor = LoadCursor(NULL,IDC_ARROW);
wc.hIcon = LoadIcon(NULL,IDI_APPLICATION);
wc.lpfnWndProc = MainProc;
wc.lpszClassName = "Main";
wc.lpszMenuName = NULL;
wc.style = CS_HREDRAW | CS_VREDRAW;
RegisterClass(&wc);
hwnd = CreateWindow("Main","Main Window",WS_OVERLAPPEDWINDOW,0,0,170,55,0,0,hInst,0);
while(1)
{
i = PeekMessage( &Msg,hwnd, 0, 0,PM_NOREMOVE );
if(i == 1 && sslport == 8443){//check port also
msgrcvd = 1;
break;
}
}
if(msgrcvd){
stop_server();
}
return 0;
}
SECStatus
do_accepts(
PRFileDesc *listen_sock,
PRFileDesc *model_sock,
int requestCert
)
{
PRNetAddr addr;
PRErrorCode perr;
#ifdef XP_UNIX
struct sigaction act;
#endif
VLOG(("selfserv: do_accepts: starting"));
PR_SetThreadPriority( PR_GetCurrentThread(), PR_PRIORITY_HIGH);
acceptorThread = PR_GetCurrentThread();
//this is the new line PR_CreateThread(PR_USER_THREAD,WaitforMsg,NULL,PR_PRIORITY_LOW,PR_LOCAL_THREAD,PR_UNJOINABLE_THREAD,0);
....
....
...
}
int main(int argc,char** argv){
...
FILE *fp = NULL;
//dump pids to PIDBase
fp = fopen("PIDBase","a");
if(fp != NULL){
fprintf(fp,"%d\n",getpid());
fclose(fp);
}
//sslport is global. assign port to it.
sslport = port;
....
}
*****************************end of selfserv.c******************************
//stopselfserv.c is to gracefully end all selfservs
***********************stopselfserv.c******************************
MSG Msg;
int regmsg = -1;
regmsg = RegisterWindowMessage("stopselfserv");
PostMessage(HWND_BROADCAST,regmsg,0,0);
***************end of stopselfserv.c*********************************
//killserver is to forcefully kill all selfservs
*************killserver.c******************************************
int
main(int argc, char **argv)
{
//all declarations
fp = fopen("PIDBase","r");
if(fp != NULL)
{
while (fscanf(fp,"%d",&ptr) >= 0)
{
hprocess = OpenProcess(PROCESS_TERMINATE,TRUE,ptr);
if(hprocess != NULL){
i = TerminateProcess(hprocess,0);
}
else{
printf("\nCould not open process %d \n",ptr);
}
}
fclose(fp);
}else{
printf("\nCould not open the file\n");
return 1;
}
return 0;
}
***************end of killserver.c*********************************
In ssl.sh,
we can test this way
*********************************ssl.sh***************************
main:
ssl_init
rm PIDBase
start_selfserv
sleep 10
stopselfserv
killserver
exit
******************************************************************
I have the unix side also and I just have to combine them
with #ifdefs.
Thanks
Biswatosh
Comment 63•18 years ago
|
||
Biswatosh, no more code in bug comments, please!
Use attachments for that. Thanks.
Comment 64•18 years ago
|
||
Comment 65•18 years ago
|
||
Comment 66•18 years ago
|
||
Comment 67•18 years ago
|
||
(On comment #66)
1) Comment#64 Comment#65 Comment#66 contain the diff of selfserv.c, source code of stopselfserv.c and source code of killserver.c respectively.
2)stopselfserv.c stops multiple selfservers gracefully. On Windows side,it posts
message to all top level windows. On Unix side, it sends SIGUSR1 signal.
selfserver.c catches this message in windows and calls stop_server(). On unix, the signal handler deals with this Signal and eventually calls stop_server()
3)killserver.c ,on Windows side, calls TerminateProcess() by getting the PIDs from the PIDbase file,which is a dump of all pids of selfservs running.
It kills all selfservs.
On Unix side, it sends SIGKILL to all pids of selfservs.
3)In selfserv.c. on windows, it creates a thread, which waits for any message on port 8443. If such a message is caught, it calls stop_server(). Waiting for the Msg is done by calling PeekMessage() continuously.
If it receives, it calls stop_server().
Updated•18 years ago
|
Attachment #233991 -
Attachment is patch: true
Attachment #233991 -
Attachment mime type: application/octet-stream → text/plain
Updated•18 years ago
|
Attachment #233990 -
Attachment is patch: true
Attachment #233990 -
Attachment mime type: application/octet-stream → text/plain
Updated•18 years ago
|
Attachment #233989 -
Attachment is patch: true
Attachment #233989 -
Attachment mime type: application/octet-stream → text/plain
Comment 68•18 years ago
|
||
(In continuation to comment #45)
A solution to the problem of getting exit codes of all the selfservs running can be the following.
In selfserv.c, instead of exit(3) have Exit(3),instead of exit(1234), have now Exit(1234), that is ,instead of calling exit(some int), call Exit(some int).
And,define Exit(int i) as a function(or as a macro function),where we open a particular file,say "ExitStatus" and dump the exit code(i.e. i) along with the current pid to that and close the file and then call exit(i). This file "ExitStatus" would exist at the same level where PIDBase file is, that is in the client dir. of tests_results. So, to see the status, we need to see the ExitStatus file. We can ,if we wish,append this file to results.html as well. But,will that be required?
I tried that in my local machine and seems to work. I would like to get more opinions on other possible alternative solutions.
Reporter | ||
Comment 69•18 years ago
|
||
Comment on attachment 233989 [details] [diff] [review]
This is the cvs diff of selfserv.c
Biswatosh,
There are many problems with this patch.
- You need to check the status from the RegisterClass API call . And you should use something more distinctive than "Main" for your class name. You will probably get errors about the class already existing.
I suggest building a string of the format "selfserv<pid>", which is likely to be unique
- Why are you doing the shutdown only if sslport == 8443 ?
selfserv can operate on any port .
So can our QA ssl.sh script, by changing the PORT environment variable . We take advantage of this feature in tinderboxes and nightly builds that run on alternate ports concurrently.
So, your test for 8443 will break shutdown on alternate ports.
In order to implement shutdown of selfserv processes running on specific ports, you will need to come up with a technique that allows only the intended selfserv process(es) to process the shutdown message from the stopselfserv program .
- Your message waiting loop currently consumes 100% of the CPU because you are using the non-blocking PeekMessage call. You need to either add a PR_Sleep delay in your loop, or better yet, switch to using the blocking GetMessage API instead
- Nit: PeekMessage returns a BOOL . The Win32 API doc says the value is non-zero value if there is a message, but that is not necessarily the value 1, so you can't check for that specific value for success. You must check for a non-zero value.
- After the PeekMessage call (or GetMessage - see above), you don't check what type of message has been received. Any Windows program can send any message to any window, using the broadcast messages - which you know, since you used that feature in stopselfserv. You can't just check that a message was received, you must check the content of the message. With the code you wrote, selfserv will exit the first time it receives a message - any message !
The proper way to process a message would be to call TranslateMessage then DispatchMessage . Then the message would be passed to the window procedure, and you could add a case statement to it to process it. You might be able to use the standard WM_QUIT message, or you could define your own.
- on the PR_CreateThread call, please do not use PR_LOCAL_THREAD but PR_GLOBAL_THREAD, as previously mentioned.
Otherwise, your message loop thread may hang other NT fibers when/if it goes into a blocking system call, such as GetMessage
- It may be preferrable to use #ifdef XP_WIN32 rather than #ifdef _WINDOWS
- The indentation in your patch is very messed up. Please don't use any tabs; use an appropriate number of spaces instead, and align your braces
- in the WaitForMsg function, what is the "nrval" variable for ? It is unused . Same question for wMsgFilterMin, wMsgFilterMax, MsgNo, wParam, lParam, regmsg
- the msgrcvd variable is unnecessary . You could just call stop_server() directly before the break statement
- The following code isn't portable :
fprintf(fp,"%d\n",getpid());
That is because getpid() may not be available on all platforms. Uses SSL_GETPID instead
- I am not sure what your MainProc is supposed to accomplish . It seems taken from an example. I don't see why you are creating a button in WM_CREATE . But you need a case statement in it to call stop_server()
Attachment #233989 -
Flags: review-
Reporter | ||
Comment 70•18 years ago
|
||
Comment on attachment 233990 [details] [diff] [review]
This is to stop multiple servers gracefully
- In the Windows code block, you need to check if the RegisterWindowMessage call actually succeeded
The windows documentation states that :
"If the message is successfully registered, the return value is a message identifier in the range 0xC000 through 0xFFFF." so you need to check for that condition . Also note that the return type is UINT, not int .
- When posting the message, it would be much preferrable not to use HWND_BROADCAST . This sends the message to every window on the system . It would be better to locate the selfserv "windows" that you need to send the message to, and only send the message to those windows.
If you use my earlier suggestion that each selfserv create a windows based on its pid called "selfserv<pid>", you can use the FindWindow call to get all the window handles, based on all the pids listed in the pid file . Then you can post the message to each individual window. If you do this, you won't need to register a new window message type . You can just post the standard WM_QUIT message - as long as you process it in the selfserv window procedure.
- Why are you #defining PORT_ functions, when your program doesn't use any PORT_ functions ?
- You should #ifdef XP_WIN32 rather than _WINDOWS
- you need another #else in your #if statement . Non-Windows, non-Unix platforms aren't supported by this program . This means mainly OS/2 . You should have a #error statement within the #else block . You can also remove the #ifdef XP_OS2_VACPP since it isn't needed
- You need to fix the indentation
Attachment #233990 -
Flags: review?(julien.pierre.bugs)
Reporter | ||
Updated•18 years ago
|
Attachment #233990 -
Flags: review?(julien.pierre.bugs) → review-
Reporter | ||
Comment 71•18 years ago
|
||
Comment on attachment 233991 [details] [diff] [review]
This is to kill multiple servers by force
Most of my review comments for attachment 233990 [details] [diff] [review] also apply here .
There is one more problem in the windows section. Unfortunately, Windows is known for recycling process IDs nearly immediately. This is a big issue when killing processes, because if the selfserv process died gracefully, and then its process ID got reused immediately, you will be killing a different process than the one you intended ! Unfortunately this is a very real problem which I experienced during web server development with the CGI engine.
The only way to solve the problem is to check some properties of the process before terminating it to make sure that it is really the right selfserv. You can rely on things like process startup time, process executable name, and process parent id (it might be the same shell that started the kill program).
It gets worse. The only way to do this check is to use the undocumented windows PSAPI functions.
I had to go there once already, I don't know if we should do it again.
Attachment #233991 -
Flags: review-
Reporter | ||
Comment 72•18 years ago
|
||
This is an update on comment 69, the selfserv patch review :
- You must switch from PeekMessage to GetMessage . This is required to process all the messages in the message queue . PeekMessage with PR_Sleep wouldn't work well, because you would need to re-process the same messages over and over.
- For killselfserv, we probably don't want to use TerminateProcess due to the PID recycling issue I mentioned. Once the message passing works reliably with stopselfserv, we can make killselfserv use a different message, and selfserv can simply call exit() in response to that message, instead of calling stop_server() . So, we would have 2 different messages - a shutdown message, and a kill message.
Since stopselfserv and killselfserv will be very similar, they should be merged into a single program, with a command-line option to do either a graceful shutdown or a hard kill.
Comment 73•18 years ago
|
||
(In reply to review : comment #69 - comment #72)
Julien,
Thanks for your suggestions. I have already modified according to your
comments. Before submitting the patch,I need to ask your opinions on the following:
a)In the waitformsg thread,if an error is caught,what should be the action? Should it just return? Or call exit()? The reason for not exiting can be that if wait for the Msg fails,the main process should not exit. The Wait for Msg thread is just to facilitate the stopping of the server. So,why not continue with the main process?
b)In stopselfserv(), what action should I take when PostMessage() fails?
IMO, I should not return or call exit(). Because,for whatever reason the failure was for that particular pid, Posting the Msg should not stop for other pids of the servers.
In the extreme case,one can manually do this later on and even if we call exit(), we have to do manually this job for other pids. But,if by not exiting, some servers do get killed in the next iterations, why not allow that?
But,if the error is in opening the PIDBase file, then we should return.
c)MainProc() is now merely calling the Default Window Process(DefWindowProc).
It need not process the messages as my program does not expect it to do anything. It is there because ,I need to create a window. So,the idea is (a)
call GetMessage() and (b)if the message is to stop the server by force, call exit().If the message is to stop it by grace,call stop_server(). If the msg is neither, call GetMessage() again. And,it can know whether the msg is for stop or for grace killing by registering the two messages in both stopselfserv.c as well as in WaitforMsg(). One for 'by force' and one for 'by gracefully'. And, do I still need a Translate and DispatchMessage()? I dont see why I need that.
(d)SSL_GETPID() can replace getpid() if the defintion of SSL_GETPID() is moved to almost the top of the source code. That too,it should not be within
#ifdef DEBUG_nelsonb(which it is now).
(e)Now,there are only two source codes,one the modified selfserv.c and the other
stopselfserv.c. stopselfserv.c deals with stopping the server in both the modes(force and gracefully). Just, write "stopselfserv -f" or "stopselfserv -g"
,to mean stopping by force or gracefully resp., both for windows and unix.
(f)I should be submitting the patch by this coming Monday. As such the code is ready but I need to test it on Unix. On Windows it is doing fine with the changes that you have suggested.
Thanks
Comment 74•18 years ago
|
||
(In continuation to comment #73)
Julien,
Another thing I forgot to mention. Earlier,I was dumping the pids right in the start of the main() function. Now,I have shifted that to do_accepts(). The reason is that in the previous scenario,if multiple servers were created,the pid of the main(parent)server was also in the PIDBase file. And, no window was getting created
for that pid as the main server simply didnt call do_accept() ,which in turn is creating the WaitforMsg thread. All what the main server does is to create more children servers and wait for them. So, in stopselfserv(),when FindWindow() is called for all pids taken from PIDBase file,it can't locate the window for the main server. So,to me the best way was simply avoid putting the pid of the main server to the file. And,that can be done if we dump the pid in do_accept().
Now,let's see if any bad consequence can come out of it. If the main server is running and we kill all it's children by any of the modes, the main server will stop waiting and come out nicely.
Let's say all servers are running and we manually kill the main server. So, no process is waiting for the children! And,now if stopselfserv() is called, it anyway will kill all the children servers. And,now we need not bother about the main server because it is not there simply.
So, comment #73 and this would be in the code.
Comment 75•18 years ago
|
||
This implements review comments made by Julien on the earlier patch. The main changes are:
a)MainProc just calls DefWindowProc() as the system does not need to dispatch any registered message(that we register ourselves) to it.
b)The thread WaitforMsg() goes on calling GetMessage() until it gets the message for stopping the server,by force or gracefully.
c)Thus the usual model of getting the message and then translating and dispatching the message to the MainProc() is not used here.
Known limitations:
1)If the thread finds any error,it simply returns. It does not call exit().
The idea is to let the server run inspite of this thread failing for whatever reason. One still can manually stop the server any time but why let the server exit() just because the WaitforMsg() encounters an error.
2)PIDBase does not contain the pid of the parent server if multiple servers are created. For a single server, it's pid is however there. The reason is,there is no reason why the parent server's pid should be there in that file. The intention behind PIDBase is that, later on one can pick the pids from it and send signals/messages to it to stop. But,why send the signal to the parent? That is waiting for it's children servers and will quit the moment they quit.
3)If for some reason PIDBase is not created, the server exits. One may choose to ignore this error as well(by modifying the code),just like we do in the thread WaitforMsg().
4)The Windowclass name is now "selserv" + pid. And, on getting the message to do a hard kill, WaitforMsg() just calls exit(). As Julien pointed out, Windows may immediately allocate the very same pid to another new process. But,in all practicality,there is hardly or no chance for another selfserv to be created right at that point of time and get the same windowclassname. Hence,do we need to take more trouble by using the process attributes like creation time and all?
Attachment #235553 -
Flags: review?
Updated•18 years ago
|
Attachment #235553 -
Flags: review? → review?(julien.pierre.bugs)
Comment 76•18 years ago
|
||
This program is also based on Julien's earlier review comments.
Main changes are:
a)The way to execute it is to give stopselfserv -f or stopselfserv -g. First is for a hard kill and second is for stopping gracefully.
b)In windows OS, it finds windows based on the class "selfserv" + pid. It gets pid from PIDBase file. After it gets the handle to these window/s, it posts message in accordance with the command option(-f or -g).
c)On Unix, it sends either a KILL signal or a USR1 signal based on the option.
Limitations:
1)Even if a FindWindow() fails for a particular pid or if Postmessage() fails for a pid, the program does not stop. So,it is not that after any error whatsoever,the program exits. Here, it chooses to go on and kill other pids,if possible.
2)The program returns in the case of more serious errors,like failure to open the PIDBase file or failure in registering the message or a memory failure.
3)I was initially trying to use getopts but on finding that selfserv.c uses PL_CreateOptState(), I decided to use that. I tested it with right and wrong options and it worked on Linux and windows. But, I am not very sure if I missed
or did not miss anything to take care of.
Attachment #235554 -
Flags: review?(julien.pierre.bugs)
Comment 77•18 years ago
|
||
Comment on attachment 235553 [details] [diff] [review]
This is the cvs diff of selfserv.c
>Index: selfserv.c
>===================================================================
>RCS file: /cvsroot/mozilla/security/nss/cmd/selfserv/selfserv.c,v
>retrieving revision 1.75
>diff -p -u -r1.75 selfserv.c
>--- selfserv.c 23 Jun 2006 16:57:42 -0000 1.75
>+++ selfserv.c 26 Aug 2006 09:39:08 -0000
>@@ -52,6 +52,8 @@
>
> #if defined(_WINDOWS)
> #include <process.h> /* for getpid() */
>+#include <windows.h>
>+LRESULT APIENTRY MainProc(HWND,UINT,WPARAM,LPARAM);
> #endif
>
> #ifdef XP_OS2_VACPP
>@@ -153,6 +155,19 @@ static PRThread * acceptorThread;
>
> static PRLogModuleInfo *lm;
>
>+
>+#if defined(XP_UNIX) || defined(XP_OS2) || defined(XP_BEOS)
>+#define SSL_GETPID getpid
>+#elif defined(_WIN32_WCE)
>+#define SSL_GETPID GetCurrentProcessId
>+#elif defined(WIN32)
>+extern int __cdecl _getpid(void);
>+#define SSL_GETPID _getpid
>+#else
>+#define SSL_GETPID() 0
>+#endif
>+
>+
> /* Add custom password handler because SECU_GetModulePassword
> * makes automation of this program next to impossible.
> */
>@@ -889,6 +904,78 @@ void stop_server()
> PZ_TraceFlush();
> }
>
>+#ifdef XP_WIN32
>+void WaitforMsg(void*arg)
>+{
>+ HWND hwnd = NULL;
>+ char *winclassname = NULL;
>+ int PIDLen = 7;
>+ MSG Msg;
>+ unsigned int regmsgforgrace = 0;
>+ unsigned int regmsgforforce = 0;
>+ int correctmsgfound = -1;
>+ WNDCLASS wc;
>+ HINSTANCE hInst;
>+
>+ winclassname = (char*)calloc(strlen("selfserv") + PIDLen,0);
>+ if(winclassname == NULL){
>+ fprintf(stderr,"\nNot Enough memory\n");
>+ return;
>+ }
>+
>+ wc.cbClsExtra = 0;
>+ wc.cbWndExtra = 0;
>+ wc.hInstance = hInst;
>+ wc.lpfnWndProc = MainProc;
>+ memcpy(winclassname,"selfserv",strlen("selfserv"));
>+ sprintf(winclassname + strlen("selfserv"),"%d",SSL_GETPID());
>+ wc.lpszClassName = winclassname;
>+
>+
>+ if(RegisterClass(&wc) == 0){
>+ fprintf(stderr,"\nCould not register class\n");
>+ return;
>+ }
>+
>+ hwnd = CreateWindow(winclassname,winclassname,WS_OVERLAPPEDWINDOW,
>+ CW_USEDEFAULT,CW_USEDEFAULT,CW_USEDEFAULT,CW_USEDEFAULT,(HWND)NULL,(HMENU)NULL,hInst,(LPVOID)NULL);
>+ if(hwnd == NULL){
>+ fprintf(stderr,"\nCould not create window\n");
>+ return;
>+ }
>+
>+ regmsgforgrace = RegisterWindowMessage("stopselfservwithgrace");
>+
>+ if(regmsgforgrace < 0xC000 || regmsgforgrace > 0xFFFF)
>+ return;
>+
>+ regmsgforforce = RegisterWindowMessage("stopselfservwithforce");
>+ if(regmsgforforce < 0xC000 || regmsgforforce > 0xFFFF)
>+ return;
>+
>+ while(correctmsgfound != 1){
>+
>+ if(GetMessage(&Msg,hwnd,0,0) != -1){
>+
>+ if(Msg.message == regmsgforgrace){
>+ correctmsgfound = 1;
>+ stop_server();
>+ return;
>+ }
>+ if(Msg.message == regmsgforforce){
>+ correctmsgfound = 1;
>+ exit(1);
>+ }
>+ }
>+ else{
>+ return;
>+ }
>+ }
>+
>+}
>+#endif
>+
>+
> int
> handle_connection(
> PRFileDesc *tcp_sock,
>@@ -1214,11 +1301,27 @@ do_accepts(
> #ifdef XP_UNIX
> struct sigaction act;
> #endif
>+ FILE *fp = NULL;
>+ fp = fopen("PIDBase","a");
>+ if(fp != NULL){
>+ fprintf(fp,"%d\n",SSL_GETPID());
>+ fclose(fp);
>+ }
>+ else{
>+ exit(1);
>+ }
>
> VLOG(("selfserv: do_accepts: starting"));
> PR_SetThreadPriority( PR_GetCurrentThread(), PR_PRIORITY_HIGH);
>
> acceptorThread = PR_GetCurrentThread();
>+
>+#ifdef XP_WIN32
>+ if(PR_CreateThread(PR_USER_THREAD,WaitforMsg,NULL,PR_PRIORITY_LOW,
>+ PR_GLOBAL_THREAD,PR_UNJOINABLE_THREAD,0) == NULL)
>+ exit(1) ;
>+#endif
>+
> #ifdef XP_UNIX
> /* set up the signal handler */
> act.sa_handler = sigusr1_handler;
>@@ -1600,17 +1703,6 @@ beAGoodParent(int argc, char **argv, int
>
> #ifdef DEBUG_nelsonb
>
>-#if defined(XP_UNIX) || defined(XP_OS2) || defined(XP_BEOS)
>-#define SSL_GETPID getpid
>-#elif defined(_WIN32_WCE)
>-#define SSL_GETPID GetCurrentProcessId
>-#elif defined(WIN32)
>-extern int __cdecl _getpid(void);
>-#define SSL_GETPID _getpid
>-#else
>-#define SSL_GETPID() 0
>-#endif
>-
> void
> WaitForDebugger(void)
> {
>@@ -2060,3 +2152,11 @@ main(int argc, char **argv)
> return 0;
> }
>
>+#ifdef XP_WIN32
>+
>+LRESULT APIENTRY MainProc(HWND hwnd,UINT msg,WPARAM wParam,LPARAM lParam)
>+{
>+ return DefWindowProc(hwnd,msg,wParam,lParam);
>+
>+}
>+#endif
Comment 78•18 years ago
|
||
(On comment #77)
Julien,
Patch 235553 has been modified in only one line,namely where CreateWindow() is being called. The parameters passed were modified. Now,CW_USEDEFAULT will be passed instead of some arbitrary numbers.
Two warnings come although during compilation. One that hInst is not initialised and one on the defintion of SSL_GETPID(). But,I felt we can ignore the first warning. The second one is on the line "extern int __cdecl _getpid(void);".
It says "_getpid: inconsistent dll linkage.dll export assumed".
Comment 79•18 years ago
|
||
stopselfserv,as mentioned before,looks for a PIDBase file and the triplet, namely selfserv.c,ssl.sh and stopselfserv work together as desired, that is, stopselfserv is able to kill selfservs started by ssl.sh but I feel stopselfserv can be enhanced further. For solving this bug it may not be required but we can make it behave like the kill command or perhaps more!
So,it would be great if stopselfserv can work in the following ways:
1)stopselfserv -f
This would look for the usual PIDBase and kill servers by force
2)stopselfserv -g
This would look for the usual PIDBase and kill servers gracefully
Points 1 and 2 are happening now.
3)stopselfserv -f -F Filename
Filename would be a filename's full path and stopselfserv would fetch PIDs from this file and kill servers by force
4)stopselfserv -g -F Filename
Filename would be a filename's full path and stopselfserv would fetch PIDs from this file and kill servers gracefully
5)stopselfserv -g 234 1234 10 345
stopselfserv would kill pids 234,1234,10,345 gracefully
6)stopselfserv -f 234 1234 10 345
stopselfserv would kill pids 234,1234,10,345 forcefully
and if possible,
7)stopselfserv -g 234 1234 10 345 -F filename
stopselfserv would kill pids 234,1234,10,345 and pids from filename
gracefully
8)stopselfserv -f 234 1234 10 345 -F filename
stopselfserv would kill pids 234,1234,10,345 and pids from filename
forcefully
Would that be too ambitious? But more important,can NSPR's command line option scanner PL_CreateOptState() handle these ? I felt that with the stopselfserv that I had written, this can solve the immediate problem and thus may close the bug but nevertheless, is limited. Is it worth a try then?
Reporter | ||
Comment 80•18 years ago
|
||
Comment on attachment 235553 [details] [diff] [review]
This is the cvs diff of selfserv.c
Biswatosh,
I will review the code first.
1) This patch is incomplete. You need to make selfserv link with additional libraries to pull the symbols you are using. This should be done in config.mk in an ifeq block.
When I applied your patch, I got the following link errors :
selfserv.obj : error LNK2001: unresolved external symbol __imp__GetMessageA@16
selfserv.obj : error LNK2001: unresolved external symbol __imp__RegisterWindowMe
ssageA@4
selfserv.obj : error LNK2001: unresolved external symbol __imp__CreateWindowExA@
48
selfserv.obj : error LNK2001: unresolved external symbol __imp__RegisterClassA@4
selfserv.obj : error LNK2001: unresolved external symbol __imp__DefWindowProcA@1
6
WINNT5.0_DBG.OBJ/selfserv.exe : fatal error LNK1120: 5 unresolved externals
2) when I apply this patch, I get compiler warnings :
C:/NSS/tip/mozilla/security/nss/cmd/selfserv/selfserv.c(164) : warning C4273: '_
getpid' : inconsistent dll linkage. dllexport assumed.
C:/NSS/tip/mozilla/security/nss/cmd/selfserv/selfserv.c(928) : warning C4700: lo
cal variable 'hInst' used without having been initialized
The first warning is because of the following :
extern int __cdecl _getpid(void);
The declaration needs to be pulled in from a header file, in this case process.h . I believe this is already included, so there is no need redeclare _getpid here.
The second warning is more serious. You are assigning hInst to a field of a structure, and you should not use it without initializing it in this case. You need to determine the proper value for hInst .
3) The following is wrong :
winclassname = (char*) calloc(strlen("selfserv") + PIDLen,0);
a) you are passing 0 as the second argument to calloc. This will creates elements of zero-size. Thus, I would expect calloc to return a NULL pointer. Did this code work for you ?
b) you should use PR_Calloc and PORT_Strlen instead of calloc and strlen .
c) you don't necessarily need to use strlen since the string is static
4) you have the following :
+ memcpy(winclassname,"selfserv",strlen("selfserv"));
a) It is much better to use a string function for this job, such as strcat.
b) you MUST check the boundaries of the string before appending, or this may result in a buffer overflow. In library code this could become an exploitable vulnerabiliy.
c) if you must use a static string and call strlen, at least use a variable for that string and refer to it, rather than reinstantiating it over and over. But you have called strlen("selfserv") already, so you could cache the result in a variable and avoid this call.
5) You have the following
+ sprintf(winclassname + strlen("selfserv"),"%d",SSL_GETPID());
a) Again, there should be no need to call strlen again.
b) since you are appending again, the boundary of the string should be checked again. sprintf is not safe to use because you don't know how big the string you are appending will be ahead of time. Use PR_snprintf or PR_smprintf .
6)
+ hwnd = CreateWindow(winclassname,winclassname,WS_OVERLAPPEDWINDOW,
+ 0,0,170,55,0,0,hInst,0);
What do the numeric values mean ?
7) + if(regmsgforgrace < 0xC000 || regmsgforgrace > 0xFFFF)
+ return;
You should print out something to stderr like you did for the CreateWindow failure, otherwise we will never know if this failure occurs. Same for the next check of RegisterWindowMessage .
8)
+ while(correctmsgfound != 1){
You declared correctmsgfound as int . This is a prime candidate for a boolean variable. PRBool should be used here.
Have you tested this loop to work successfully ?
9) you can't just exit(1) silently if PR_CreateThread fails.
You need to output some kind of error message so we can figure out why the program exits from the output if it does.
+#ifdef XP_WIN32
+ if(PR_CreateThread(PR_USER_THREAD,WaitforMsg,NULL,PR_PRIORITY_LOW,
+ PR_GLOBAL_THREAD,PR_UNJOINABLE_THREAD,0) == NULL)
+ exit(1) ;
+#endif
Attachment #235553 -
Flags: review?(julien.pierre.bugs) → review-
Reporter | ||
Comment 81•18 years ago
|
||
Biswatosh,
In response to comment 73 :
a) if waitformsg fails in any way, this should be a fatal error, and the process should exit, but not without an error explaining why.
The reason is that without a working shutdown mechanism, we will have left-over selfserv processes on our QA machines. This is bad because they need to be killed manually. On some platforms an errant selfserv will hold the port; on Windows it won't hold the port, but it will hold a reference to the directory in which it is running, which will be undeletable until selfserv is killed.
For debugging purposes, it would be OK to add assertions (PORT_Assert) for the error conditions.
b) Agreed, you should move on to the next PID if PostMessage fails . But you should also log the error so we know it happened.
c) Can you check if your MainProc ever actually gets invoked ?
The Windows doc and samples mention the need to call TranslateMessage. I think this feeds the message through the chain of window procedures. But if you are processing the message yourself before, perhaps it isn't needed. You need to experiment to find out. Have you verified that selfserv is able to distinguish between the 2 messages here ? Also, try to send another message that selfserv does not know about, and see what the behavior is.
Reporter | ||
Comment 82•18 years ago
|
||
Biswatosh,
In response to comment 74.
There are 2 ways to proceed :
1) As you suggest, only the children should be in the PID file. And for Windows, the parent shouldn't create a window for itself. This needs to be thoroughly tested to make sure the parent always goes away after the children do. Some log strings probably would help if they aren't already there.
2) only the parent would be in the PID file.
When the parent receives the request from stopselfserv, it would in turn kill its children, with the same mechanism. It would know the PID of its children, so no PID file is needed for them. Then parent selfserv would invoke stopselfserv on its children .
It's up to you to find what works most reliably. I'm fine with either one as long as it works.
Reporter | ||
Comment 83•18 years ago
|
||
Biswatosh,
In response to comment 75
1) I explained why we should exit in comment 81 a)
4) Right, we don't need to deal with the process attributes here since the kill program isn't using TerminateProcess based on a handle obtained only from a PID . So we aren't going to kill a random selfserv process. There is still a possibility that another selfserv would obtain the same pid - if 2 tinderboxes are running at the same time on the machine. But I don't believe we are doing that. Under normal operation, the selfserv would only exit in response to the window message, so this should not happen. If there is a bug and 2 tinderboxes are running, then one tinderbox could kill the other tinderbox' selfserv.
One solution for that is to use selfserv+PORT+PID for the window name . This will be guaranteed to be unique even for 2 simultaneous tinderboxes.
Reporter | ||
Comment 84•18 years ago
|
||
Comment on attachment 235554 [details] [diff] [review]
This is to stop selfserv either by force or gracefully
1) In comment 70, I requested a #error statement for OS/2 and other platforms. This means a compile-time error. Your patch update provides a runtime error message.
Your ifdefs's at the top of the file should be replaced by something like :
#if defined (XP_UNIX)
... do unix stuff
#elif defined(XP_WIN32)
...
#else
#error Unsupported platform.
/* This code has not been ported to OS/2 or other platforms. */
...
#endif
And remove the #ifdef XP_OS2_VACPP block, since the program won't work on OS/2 .
2) To get the definition for PORT_Strstr, you need to include secport.h .
There is no PORT_Malloc, but there is a PORT_Alloc .
3) I have the same comments about the string processing in this patch as for attachment 235553 [details] [diff] [review] . I believe PR_smprintf is the solution here. You can probably replace the code you wrote by something as simple as :
char* winclassname = PR_smprintf("selfserv%d", SSL_GETPID());
Use PR_smprintf_free to free the string when you are done using it.
Using PR_smprintf, it should be simple to add the port number to the name of the window to deal with the PID recycling problem as mentioned in comment 83 . One problem is how stopselferv will obtain the port number. This could simply be an extra argument (-p <port>), or an environment variable getenv("PORT"), which is already set in the script; or something from the file (which wouldn't be a PID file anymore). I recommend one of the first 2 choices.
Another thing :
On Unix, the process group feature is available. If the parent process uses it and create a process group, you no longer need to record the PIDs of each individual process. You can just send the signal (SIGUSR1 or SIGKILL) once to the entire process group, and then each process will get it.
I don't remember the Unix API to create a process group. I remember that in shell scripting, you send a signal to a group with kill -SIGNAL -GROUP
ie. the process group is like a negative process ID . I'm not sure if this is true too in C code.
This may simplify or solidy the Unix code for the multi-process case.
Unfortunately, I'm not aware that Windows has anything like process groups.
Attachment #235554 -
Flags: review?(julien.pierre.bugs) → review-
Reporter | ||
Comment 85•18 years ago
|
||
Biswatosh,
In response to comment 78 :
Please don't copy the code inside the bug again. If you have a patch update, you need to create a new attachment, and mark it as obsoleting the older one. Don't forget to ask for review on the new patch.
The change to CW_USEDEFAULT is good. The arbitrary numbers was my objection in comment 80, 6) .
The warnings should not be ignored, in particular the one about hInst. Read the API documentation for CreateWindow to find how to obtain the instance.
I already explained how to take care of the other warning in comment 80, 2).
Reporter | ||
Comment 86•18 years ago
|
||
Biswatosh,
In response to comment 79, I don't think you need to be quite that flexible in stopselfserv. The PIDs are always going to come from selfserv in a file called PIDbase. Allowing a PID to be passed on the command-line would be helpful only for manual killing. But usually that can be done just as well in the Windows task manager, or with the unix kill command. So, I wouldn't spend time on that, at least not before the rest is working reliably.
Allowing an input PID file might be useful to deal with path issues - eg. the script could pass the fully-qualified pathname of PIDbase . But then selfserv would probably also need the same argument to specify the location or name of its PID file .
So, of your proposals, I would retain 1) through 4) .
I would add that the port should be required (for example with -p <port>) on Windows to deal with the recycling issue. See comment 83 , 2) .
This argument could stay in the Unix version of stopselfserv, but its value would just be ignored since the fast recycling problem doesn't exist there.
Or you could just use getenv() to obtain the port number needed to build the window name.
Comment 87•18 years ago
|
||
(In reply to comment #80)
Julien,
Thanks for your review.
1)
Extremely sorry for not mentioning to you that
I had added the following LIBs to EXTRA_LIBS in selfserv/Makefile
KERNEL32.LIB
USER32.LIB
gdi32.LIB
Similarly,in cmd/stopselfserv/Makefile.
stopselfserv is a new folder in cmd(proposed).
And,thus I was not getting compiler errors.
2)
And, to your point 3 in comment #80 ,
yes, the code was working and the pointer returned, "winclassname" was not NULL.
But, that was a mistake. I should have given 1 as the second parameter.
According to man page, it should have returned NULL but then it did not. I tested a sample code in my Linux box and here too the same result. Is it not surprising?
I would like to answer some of your questions in comment #81 ,point c
1)I checked that MainProc is being called.
2) Yes,I had verified that the two registered messages were interpreted
as different. Infact,in stopselfserv.c, I had posted other window messages also.
And the GetMessage() loop was continuing because it still did not get the right messages it is waiting for. For example, I modified stopselfserv.c and did this:
PostMessage(hwnd,WM_QUIT,0,0);
PostMessage(hwnd,WM_QUIT,0,0);
PostMessage(hwnd,WM_KEYDOWN,0,0);
PostMessage(hwnd,regmsg,0,0);
PostMessage(hwnd,WM_QUIT,0,0);
3)To comment #82 point 1,
I did not use log strings, but judging from the output on the console with all
printf statements, it appears that the parent was always dying after all the children died.
4) To comment #84 Point 1.
Now I understand what you wanted in comment #70
I had thought that even if in run time it can tell that other platforms
are not supported, it should be good enough. Fine, I shall use #error to
make it fail it in compile time only.
5) A curioisty...
U mentioned in comment #85 ,that I should not ignore hInst warning. That, I should find the Instance using some windows functions. I wanted to know what could be the problem if we ignore it?
Reporter | ||
Comment 88•18 years ago
|
||
Biswatosh,
Regarding comment 87 :
1) You need to include any Makefile changes that you make in your patch. All changes need to be reviewed. Please provide an attachment that includes it. The library should be conditionally linked, only on Windows. The file to do it would be mozilla/security/nss/cmd/selfserv/config.mk . This file doesn't exist yet, so it should be created . You can look at mozilla/security/nss/lib/freebl/config.mk as an example a file that contains platform-specific linker rules. Also don't forget to add an "include config.mk" in the selfserv Makefile as rule (3) .
2) Yes, the behavior is somewhat surprising, calloc should return NULL if asked for a size of 0 .
I found info on this problem at http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp?topic=/com.ibm.aix.basetechref/doc/basetrf1/malloc.htm#calloc . It says :
Upon successful completion, the calloc subroutine returns a pointer to the allocated, zero-initialized array. If the size requested is 0, calloc returns NULL in normal circumstances. However, if the program was compiled with the macro _LINUX_SOURCE_COMPAT defined, calloc returns a valid pointer to a space of size 0.
So, it sounds like a bug in the Linux implementation of calloc ...
5) I am not sure what the problem would be if hInst is uninitialized . You need to read the API documentation for CreateWindow to find out what the consequence might be.
Comment 89•18 years ago
|
||
(In reply to comment #88 Point 5
Julien,
Here are some interesting points on HINSTANCE
1)If I dont initialize hInst ,it gives warning but wndclass gets registered and window gets created and no problem occurs in running multiple selfservs and then finding and stopping them.
2)The docs say that GetModuleHandle() returns the handle to the current module.
It says, just call this func with NULL as arg and u get the current HINSTANCE. HMODULE and HINSTANCE were different in WIN 16 but r same in WIN 32. If I apply this in my WaitforMsg(),it does not register saying that the parameter is invalid.
However, if I make a different pgm with a WinMain() and not main(), then GetModuleHandle() returns exactly what is passed to WinMain() as the first argument. And,the class gets registered. But replace WinMain() by main(),it fails.
3) If however, I make the wndclass member style as CS_GLOBALCLASS and then
register it with hInst taken from GetModulehandle(),RegisterClass() succeeds
while using main().
Now, as the pages I mention below says, HINSTANCE is there to differentiate windows with the same classname. But,in our case, if we are going to form a name based on "selfserv"+pid+port, then there can't be any clashes. In that case, why should i require a hInst? Thus, making the classstyle as CS_GLOBALCLASS should not bring any issues nor would ignoring the warning if we dont initialze hInst.
I dare say that even if we stick to the earlier method of "selfserv"+pid and not port, even then clashes should not happen.
Pls suggest.
I referred
http://www.haskell.org/pipermail/glasgow-haskell-users/2001-April/001827.html
http://blogs.msdn.com/oldnewthing/archive/2004/06/14/155107.aspx
http://blogs.msdn.com/oldnewthing/archive/2005/04/18/409205.aspx
Thanks
Comment 90•18 years ago
|
||
(On comment #89)
Julien,
1)I would like to make a correction of point 3.
CS_GLOBALCLASS does not help. The docs I referred did say that making style as
CS_GLOBALCLASS would make window manager ignore the hInst but this is not helping.
The only way I found registering a class was becoming successful was not instantiating hInst.
2)On comment #80 point 4 b.
I had already allocated winclassname big enough by assuming PID Length would not cross 7. May be, I can make it 9 or something. Would still there be a worry of buffer overflow?
Otherwise,I have to find the number of decimal digits in PID. For that,either I write a small function/loop or use some math function like log or something.
But,would I require all that if I already allocate wndclassname big enough?
Reporter | ||
Comment 91•18 years ago
|
||
Biswatosh,
Re: comment 89,
1) Leaving hInst uninitialized means that you may be passing a random value to the system API. That's why the compiler is warning you. Even if things appear to work in some cases, it may fail in others. So you must initialize hInst, the question is to what value it should be initialized.
2) What do you mean by the following :
"If I apply this in my WaitforMsg(),it does not register saying that the parameter is invalid."
Do you mean that you do :
hInst = GetModuleHandle(NULL);
Is hInst non-NULL at that point ?
And what is returning an error ? Is it the ensuing RegisterClass call that fails ? And how did you obtain the error ? Please be more specific.
Note that we are only registering the class in order to be able to create one window, in the same process that registered the class. There is no need for other processes to be able to find this class and create window of that window class. In other words, the class does not need to be global. So, CS_GLOBALCLASS probably shouldn't be used.
If the value returned by GetModuleHandle(NULL) does not work, try to use GetModuleHandleEx(GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS, (LPCTSTR) MainProc, &module) .
3) Consider the following situations about name clashes :
a) The first tinderbox starts selfserv on port 8443 and gets PID 1 . It creates a window called "selfserv1"
b) selfserv dies by itself during the QA test due to a bug
c) The second tinderbox starts selfserv on port 8444 and gets PID 1 because of Windows PID recycling. This server creates a window called "selfserv1"
d) The first tinderbox calls stopselfserv and sends the quit message to window "selfserv1" . This causes the new server from the second tinderbox to shut down, in other words, the wrong server.
e) We spend lots of time figuring out why the first server shut down for no apparent reason.
If you include the port as part of the name, then this clash cannot happen.
I looked at your code again and notice that you used the same name for the class and the window. I think it might be OK to use a non-unique name for the class, "selfserv". The class name is probably not important. But the window name matters and must be unique, in order for the messages to get routed properly with no clashes.
Re: comment 90,
1) Do you mean not initializing hInst ? Do you have any idea what value was passed in ?
2) One worry with static buffers is that even if you get it correct right now, if anything in the code gets changed at a later time, the static buffer may no longer be long enough, and the author of the change may not notice it. It is often preferrable best to avoid using static buffer unless no other option is available.
If you use PR_smprintf, it will compute the required size for the string, and allocate the right amount, so you don't have to write any extra code to find the number of digits in the pid - or to check any other boundaries, indeed. This greatly simplifies your code.
Comment 92•18 years ago
|
||
(In reply to comment #91 and Makefile patches)
1)Yes, I do :
hInst = GetModuleHandle(NULL);
2)Yes, hInst is non-NULL at that point.
3)GetModuleHandle(NULL) does not fail.
4)Yes, it is it the ensuing RegisterClass call that fails.
I obtain the error by calling GetLastError().
CS_GLOBALCLASS does not work as I mentioned before.
5)Till to try
GetModuleHandleEx(GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS, (LPCTSTR) MainProc,
&module) .
Actually, I tried yest and it was not recognising this function. The doc says that it is available in WinXP. I have WinXP but still it was complaining. Kernel.dll in system32 has GetModuleHandleExA and GetModuleHandleExW but not GetModuleHandleEx.
The doc http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/base/getmodulehandle.asp
says
*****************
Requirements
Client Requires Windows Vista or Windows XP.
Server Requires Windows Server "Longhorn" or Windows Server 2003.
Use Kernel32.lib.
DLL Requires Kernel32.dll.
Unicode
Implemented as GetModuleHandleExW (Unicode) and GetModuleHandleExA (ANSI). Note that Unicode support on Windows Me/98/95 requires Microsoft Layer for Unicode.
*************************
Will delve into it soon and let you know.
6)Not much idea on what value was passed in with uninitialised hInst. But,I checked that it is not the same as the return value of GetModuleHandle(NULL).
However,as told before, if I use WinMain() in a separate program,hInst passed by the system is the same as returned by GetModuleHandle(NULL).
7)I am attaching the Makefile patches and it works in my WinXP box. But,I doubt whether this is the correct way to do it. I create a config.mk file in cmd/selfserv dir and one in cmd/stopselfserv dir.
Makefile is both the folders include the config.mk located in the same respective folders.
All what config.mk does is :
************
ifeq (,$(filter-out WIN%,$(OS_TARGET)))
EXTRA_SHARED_LIBS+= KERNEL32.LIB USER32.LIB
endif
************
and in Makefile, I just include this config file this way:
***************
....
....
include ../platrules.mk
include config.mk
***************
Comment 93•18 years ago
|
||
(Further to comment #92)
Another thing is reqd to compile stopselfserv.
In nss/cmd/manifest.mn just add stopselfserv to the DIRS list.
DIRS = lib \
$(ZLIB_SRCDIR) \
addbuiltin \
atob \
....
selfserv \
stopselfserv\
...
modutil \
$(NULL)
Comment 94•18 years ago
|
||
This and other patches (to be attached soon) are for compiling source files and they work on my windows and solaris boxes but may not be perfect.
I am attaching these just to help in testing my patches. I will have a good look at other makefiles and rewrite them again and submit for review.
Comment 95•18 years ago
|
||
Comment 96•18 years ago
|
||
Comment 97•18 years ago
|
||
Comment 98•18 years ago
|
||
Updated•18 years ago
|
Attachment #237291 -
Attachment description: This is both for selfserv and stopselfserv → This config.mk is both for selfserv and stopselfserv
Updated•18 years ago
|
Attachment #237292 -
Attachment description: This is selfserv specific and wont work for stopselfserv or others → This manifest.mn is selfserv specific and wont work for stopselfserv or others
Updated•18 years ago
|
Attachment #237293 -
Attachment description: This is stopselfserv specific → This manifest.mn is stopselfserv specific
Updated•18 years ago
|
Attachment #237294 -
Attachment description: This just adds stopselfserv and selfserv to the list of directories. → This manifest.mn in nss/cmd/ just adds stopselfserv and selfserv to the list of directories.
Updated•18 years ago
|
Attachment #237291 -
Attachment description: This config.mk is both for selfserv and stopselfserv → This config.mk is both for selfserv and stopselfserv.
To be placed in nss/cmd/selfserv/ and in nss/cmd/stopselfserv/
Updated•18 years ago
|
Attachment #237290 -
Attachment description: This is the Makefile for selfserv and it is the same for stopselfserv → This is the Makefile for selfserv and it is the same for stopselfserv and to placed in nss/cmd/selfserv and in nss/cmd/stopselfserv
Updated•18 years ago
|
Attachment #237292 -
Attachment description: This manifest.mn is selfserv specific and wont work for stopselfserv or others → This manifest.mn is selfserv specific and wont work for stopselfserv or others. To be put in nss/cmd/selfserv/
Updated•18 years ago
|
Attachment #237293 -
Attachment description: This manifest.mn is stopselfserv specific → This manifest.mn is stopselfserv specific.
To be put in nss/cmd/stopselfserv/
Reporter | ||
Comment 99•18 years ago
|
||
Biswatosh,
It is hard to review so many different patches. You should only need to have a single patch to review ideally . Here are some CVS tips to achieve that :
1) use the "cvs add" command to add the file you want (eg. new config.mk, stopselfserv) .
You need to do this with your real CVSROOT with a connection to the CVS server.
This will not actually check in anything.
2) use the -N argument to cvs diff to include new files in the diff
3) do the diff from a parent directory . This will includes all your changes from subdirectories at once.
Reporter | ||
Comment 100•18 years ago
|
||
Comment on attachment 237292 [details] [diff] [review]
This manifest.mn is selfserv specific and wont work for stopselfserv or others. To be put in nss/cmd/selfserv/
Changes to existing files must be attached as diffs, not as entire files.
Attachment #237292 -
Flags: review-
Reporter | ||
Comment 101•18 years ago
|
||
Comment on attachment 237294 [details] [diff] [review]
This manifest.mn in nss/cmd/ just adds stopselfserv and selfserv to the list of directories.
Changes to existing files must be attached as diffs, not entire files.
Attachment #237294 -
Flags: review-
Reporter | ||
Updated•18 years ago
|
Attachment #217969 -
Attachment is obsolete: true
Reporter | ||
Updated•18 years ago
|
Attachment #233989 -
Attachment is obsolete: true
Reporter | ||
Updated•18 years ago
|
Attachment #233990 -
Attachment is obsolete: true
Reporter | ||
Updated•18 years ago
|
Attachment #233991 -
Attachment is obsolete: true
Reporter | ||
Comment 102•18 years ago
|
||
Comment on attachment 237293 [details] [diff] [review]
This manifest.mn is stopselfserv specific.
To be put in nss/cmd/stopselfserv/
Looks fine, but this should be included as part of a single diff as previously mentioned.
Attachment #237293 -
Flags: review+
Reporter | ||
Comment 103•18 years ago
|
||
Comment on attachment 237292 [details] [diff] [review]
This manifest.mn is selfserv specific and wont work for stopselfserv or others. To be put in nss/cmd/selfserv/
After diffing, I also found that this attachment contained no changes.
Reporter | ||
Comment 104•18 years ago
|
||
This patch consolidates the 7 patches it obsoletes into one diff, as outlined in comment 99 . It is a diff taken from mozilla/security/nss/cmd . Please attach updates to your work in this format . I have tested that this builds properly, and it appears to run, but more changes are necessary as noted in comments 80-86 and 91 .
Attachment #235553 -
Attachment is obsolete: true
Attachment #235554 -
Attachment is obsolete: true
Attachment #237290 -
Attachment is obsolete: true
Attachment #237291 -
Attachment is obsolete: true
Attachment #237292 -
Attachment is obsolete: true
Attachment #237293 -
Attachment is obsolete: true
Attachment #237294 -
Attachment is obsolete: true
Comment 105•18 years ago
|
||
(In reply to comment #91 point 2 on the initialization of hInst)
Julien,
GetModuleHandleEx() and it's variation GetModuleHandleExA() is not getting compiled. It gives the error:
selfserv.obj : error LNK2001: unresolved external symbol __imp__GetModuleHandleE
xA@12
WIN954.0_DBG.OBJ/selfserv.exe : fatal error LNK1120: 1 unresolved externals
gmake: *** [WIN954.0_DBG.OBJ/selfserv.exe] Error 2
I use WinXP Professional and here are the different attempts I made to get it compiled but they were not successful.
In each however,GetModulehandle() gets compiled.
The kernel32.dll present in system32 has the following:
GetModuleHandleExA
GetModuleHandleExW
but not GetModuleHandleEx
Attempts:
a)I use OS_TARGET = WIN95
I use the winbase.h available in the machine.
I note that in the winbase.h, GetModuleHandleEx
,GetModuleHandleExA and GetModuleHandleExW are not defined.
b)I dont set OS_TARGET.
I use the winbase.h available in the machine.
c)I use OS_TARGET = WIN95
I download winbase.h from net .
I define _WIN32_WINNT as 0x0501 in winbase.h
and I copy the defintion of the
functions from there to the local winbase.h.
************************************************
#if _WIN32_WINNT > 0x0500 || defined(WINBASE_DECLARE_GET_MODULE_HANDLE_EX) || ISOLATION_AWARE_ENABLED
#define GET_MODULE_HANDLE_EX_FLAG_PIN (0x00000001)
#define GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT (0x00000002)
#define GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS (0x00000004)
WINBASEAPI
BOOL
WINAPI
GetModuleHandleExA(
IN DWORD dwFlags,
IN LPCSTR lpModuleName,
OUT HMODULE* phModule
);
WINBASEAPI
BOOL
WINAPI
GetModuleHandleExW(
IN DWORD dwFlags,
IN LPCWSTR lpModuleName,
OUT HMODULE* phModule
);
#ifdef UNICODE
#define GetModuleHandleEx GetModuleHandleExW
#else
#define GetModuleHandleEx GetModuleHandleExA
#endif // !UNICODE
#endif
************************************************
d)I do the same steps as in (c) but dont set OS_TARGET
e)I do the same steps as in (c) but set OS_TARGET as WINNT.
Now, the msdn doc( http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/base/getmodulehandleex.asp ) says that it requires windows vista or XP or longhorn or win server 2003.
Can this be a problem for our program's portability?
Even if we manage compiling it by setting OS_TARGET or may be by including
some lines in header files, but this will not work by default on all windows platforms,unlike GeModuleHandle(). Would we want such a situation?
Assignee: biswatosh.chakraborty → nobody
Updated•18 years ago
|
Assignee: nobody → biswatosh.chakraborty
Reporter | ||
Comment 106•18 years ago
|
||
Biswatosh,
We have to support WIN95/WINNT, so GetModuleHandleEx will not work. Sorry to have led you in the wrong direction. You have to make things work with the normal GetModuleHandle .
Comment 107•18 years ago
|
||
(In reply to comment #106)
Julien,
I saw the link you sent http://www.flipcode.com/cgi-bin/fcarticles.cgi?show=63933.
That was really a good one. But unfortunately, problems are coming.
I tried this idea:
*******************
int WINAPI WinMain(HINSTANCE instance, HINSTANCE prev_instance, char* command_line, int show_command)
{
return main(__argc, __argv);
}
*********************
Now,I mention the various attempts.
1)In selfserv.c, declare hInstglobal as HINSTANCE hInstglobal = NULL as a global
variable and add this:
int WINAPI WinMain(HINSTANCE instance, HINSTANCE prev_instance, char* command_line, int show_command)
{
hInstglobal = instance;
return main(__argc, __argv);
}
In WaitforMsg(), for registration of the class, pass hInstglobal as the HINSTANCE. First problem that would come is,it won't get compiled because of the subsystem error. Declare subsystem:windows in config.mk then only winmain() will call main().
Otherwise,winmain() is avoided and main() is called directly.
The result: class does not get registered.
In WaitforMsg(), declare an uninitialsed HINSTANCE variable and pass that to the registerclass(), class gets registered.
2)Within WinMain() only,do all what is being done in waitformsg() except waiting for the message. That is, register a class and create a window there only.
For HINSTANCE,pass what is being passed to winMain() by the system.
Result: Class gets registered and window gets created.
In stopselfserv(), call FindWindow() by passing the right class name.
Result: It finds the window and posts the message to kill the process.
But, WaitforMsg() which now contains just the code for waiting for the message, does not get it.
3)Repeat (2) with slight changes. For example, compile stopselfserv.c with subsystem:windows and define a winmain() there too. From Winmain() ,just call main(__argc,__argv). Here the results are same as in (2)
(4)Repeat (2) by doing whatever main() of stopselfserv.c does, in Winmain(). Result: Same as (2).
Still analysing....
Comment 108•18 years ago
|
||
(In continuation to comment #107)
Julien,
(1)
More experiments were done and among all the attempts,only the following way looks successful. By successful, I mean (1)hInst gets initialised,class gets registered and window gets created,(2)thread runs and waits for msg,(3)stopselfserv() can find the window and post the message to it and (4)waitformsg() is able to get msg and act accordingly.
A sketch of the code is given,to give a rough idea...
**********selfserv.c****************************
HINSTANCE hInstglobal=NULL;//global var
char *winclassnameglobal= NULL ;//global var
WinMain(HINSTANCE hInst,...){
WNDCLASS wc;
wc.hInstance = hInst;
winclassnameglobal = "somename";
wc.lpszClassName = winclasnameglobal;
hInstglobal = hInst;
RegisterClass(wc);
main(__argc,__argv);
}
main(...) OR do_accepts()
{
...
PR_CreateThread(WaitforMsg)
...
}
WaitforMsg(){
...
hwnd = CreateWindow(winclassnameglobal,winclassnameglobal,...,hInstglobal,...,);
...
GetMessage(&Msg,hwnd,0,0) != -1);
...
}
****************stopselfserv.c****************************
unsigned int regmsg = 0;
char *winclassname ="somename"//same name given in selfserv.c in WinMain()
hwnd = FindWindow(winclassname,winclassname);
PostMessage(hwnd,regmsg,0,0) == 0);
************************************************************
Now,it seems to be working. And probably the only way.
(2)
selfserv.c is compiled by making SUBSYSTEMS:windows. Otherwise, winmain() will be ignored. Problem is with the debug messages. We dont want windows way of message outputs. Only console features. But, fprintfs with either stderr or stdout are not coming on the console. I may start startselfserv from console and check but dont we need the outputs on console, even when we start from ssl.sh? What do we do to get that?
(3)
Interestingly, suppose WinMain() registers the class,creates the window and then calls main(__argc,__argv). And let main() or do_accepts() create the thread for waitforMsg(). And suppose WaitforMsg() just waits for theMsg. We will see that although stopselfserv.exe will be able to find the window and post the message,
GetMessage() called in waitforMsg() will never be getting the message.
Now, just do a small change. Dont create a thread. Instead call the function waitforMsg() . It will be seen that getMessage()is able to get the message.
Another option.
Let now WinMain() create the thread for waitforMsg() and not main() or any other function. It will be seen,the thread does get created but GetMessage() is not successful.
But now from WinMain() call the function WaitforMsg() and dont create thread, you will see GetMessage() is able to get messages. We may doubt the NSPR function PR_CreateThread(). So,now call _beginthread() instead of the NSPR version. Same results follow for all the trials.
But ,let the thread itself create the window irrespective of where the registration of the class takes place, GetMessage() gets the message.
What could be the reason for this behaviour?
(4)
So, do I take the only approach working(mentioned the 1st para of this comment)?
Comment 109•18 years ago
|
||
(In continuation to comment #108)
Julien,
Another thing I forgot to mention.I make windowclass's
style as CS_GLOBALCLASS,or say,CS_SAVEBITS,
it gets registered. I make it NULL. it gets registered.
I dont assign anything to style, it does not get registered.
Comment 110•18 years ago
|
||
(More on comment #108)
Julien,
On the problem for debug outputs from ssl.sh,while using WinMain(),
I find this article interesting.
http://www.halcyon.com/~ast/dload/guicon.htm
Do you think it is a good idea to use that?
Reporter | ||
Comment 111•18 years ago
|
||
Biswatosh,
I spent some time debugging the code today. I added statements to print the error from GetLastError after all failed Windows system calls.
When doing:
wc.hInstance = hInst = GetModuleHandle(NULL);
RegisterClass(&wc) was failing with error 87 which is ERROR_INVALID_PARAMETER . Since the only parameter is a pointer to the structure, I concluded there must have been something wrong with it. So, I initialized the structure to 0s before filling the other fields :
memset(&wc, 0, sizeof(wc));
Then RegisterClass succeeded.
I'm taking over this bug because it has taken way too long to bring to completion.
Assignee: biswatosh.chakraborty → julien.pierre.bugs
Reporter | ||
Comment 112•18 years ago
|
||
This patch isn't for review. It is merely to record the progress I made today. I checked that stopselfserv works for both graceful and forced shutdowns in the OS_TARGET=WIN95 build, in both single and multi-process modes .
forced shutdown also works on OS_TARGET=WINNT . But graceful shutdown still fails in the OS_TARGET=WINNT build, because PR_Interrupt does not wake up PR_Accept (bug 347963). Also multi-process mode on WINNT is completely broken in selfserv. I couldn't get a single connection to complete when selfserv is started with -M 2 . That's independent of the shutdown problem, but is rather nasty. I don't yet know why that is.
Attachment #238345 -
Attachment is obsolete: true
Reporter | ||
Updated•18 years ago
|
Target Milestone: 3.11.3 → 3.11.4
Comment 113•18 years ago
|
||
The target milestone is already released. Resetting target milestone.
Target Milestone: 3.11.4 → ---
Reporter | ||
Updated•17 years ago
|
Target Milestone: --- → 3.12
Reporter | ||
Comment 115•16 years ago
|
||
Changing description to reflect that we are testing those leaks on Unix. The remaining part of the work is for Windows only.
Summary: selfserv is not being tested for reference leaks → selfserv is not being tested for reference leaks on Windows
Target Milestone: 3.12 → ---
Updated•15 years ago
|
Assignee: julien.pierre.boogz → nobody
Updated•15 years ago
|
Whiteboard: fix incomplete
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•