Closed Bug 303328 Opened 19 years ago Closed 1 year ago

Add exceptq support to os/2 port

Categories

(NSPR :: NSPR, enhancement)

x86
OS/2
enhancement

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: steve53, Unassigned)

Details

Attachments

(6 files, 12 obsolete files)

3.90 KB, patch
dragtext
: review?
nelson
Details | Diff | Splinter Review
3.01 KB, patch
wuno
: review+
Details | Diff | Splinter Review
10.02 KB, patch
Details | Diff | Splinter Review
2.86 KB, text/plain
Details
3.14 KB, text/plain
Details
3.14 KB, text/plain
Details
User-Agent:       Mozilla/5.0 (OS/2; U; Warp 4.5; en-US; rv:1.7.8) Gecko/20050530
Build Identifier: Mozilla/5.0 (OS/2; U; Warp 4.5; en-US; rv:1.7.8) Gecko/20050530

Exceptq is an OS/2 specific facility to capture process state when and exception
occurs

Reproducible: Always
Status: UNCONFIRMED → NEW
Ever confirmed: true
I still don't see the significant difference with the patch I posted some time
ago in the newsgroup (or the one dynamically loading the DLL that I think I sent
to you). Unlike the small sample I created then I didn't try to set the handler
twice in the real patch. Is it just that you define the handler function as _ERR
instead of ULONG _System or APIENTRY? (I guess the definition of the four
function parameters isn't necessary but then why were they included in the
exceptq sample?)
Component: General → NSPR
Product: Mozilla Application Suite → NSPR
Version: unspecified → other
(In reply to comment #2)
> I still don't see the significant difference with the patch I posted some time
> ago in the newsgroup (or the one dynamically loading the DLL that I think I sent
> to you). Unlike the small sample I created then I didn't try to set the handler
> twice in the real patch.

Based on the copy of os2thred.c you sent my way, the code to set the handler
twice is there.  I can't tell what was or was not commented out when you did you
dynamic load testing.  You were definitely setting the handler twice using the
same registration record the the copy sample_gcc_dyn.c you sent my way and this
is the reason it looped.

>Is it just that you define the handler function as _ERR

This is just clean up and making use of what the toolkit provides.  The exceptq
code is ancient and not all that clean or efficient.  It's basic claim to fame
is that it works.  Note that xworkplace and warpin use the exceptq code
essentially unmodified.

Now that exceptq basically works within Mozilla (for me that is), I'm off to fix
up some defects in the exceptq code.  It does a poor job of recognizing 32-bit
code.  There are also issues with decoding codeview data, but that's something else.

I also need to hook the handler into thread 1 and I'm not yet quite sure where
this code needs to be.

I guess the important question is do you now get a .TRP file?
Argh, my code worked all the time, I was just too stupid to test it correctly!
As you said, this exception handler does not get set in the first thread and I
was producing crashes somewhere else in the code (not directly in ExcpStartFunc)
which always happened to be in that first thread...

> I guess the important question is do you now get a .TRP file?

Yes, they are just not very useful as you noticed yourself, but I do get them.
OK, I managed to also set the exceptq handler for the first thread. I did this
by moving Steven's exceptq loader code within os2thred.c to extra functions
called PR_OS2_Set/UnsetGeneralExcpHandler (they also need to get defined in
pprthred.h and included in nsprpub/pr/src/os2extra.def) but for now only in a
quick and dirty way. Then I add these to the ScopedFPHandler class in
nsAppRunner. Now I also get a .TRP file for a crash in the first thread, e.g.
when doing D&D over the about:config window.

Still, the EXCEPTQ #ifdefs are missing in nsAppRunner. I guess to do this
properly we should add an argument --enable-exceptq in configure.in that is off
by default and that sets EXCEPTQ in autoconf.mk so that we can easily switch it
on and off.
Comment on attachment 191802 [details] [diff] [review]
New rough patch, setting handler in also in nsAppRunner

My, that was a really bad patch. I set the handler for the first thread but
forgot to edit the copied names of the functions so it wasn't set for the other
threads any more (instead I set the FP handlers twice)...

Well, I was working on a better patch with proper integration and also the
possibility to unload the DLLs but somehow that ended up crashing the browser.

Additionally I noticed, that if the changes in xpfe/bootstrap/nsAppRunner.cpp
were finalized we would need the same changes in toolkit/xre/nsAppRunner.cpp.
Attachment #191802 - Attachment is obsolete: true
Steven, only now did I find your newsgroup posting where you recommend to do the
calls in _PR_MD_EARLY_INIT(). That makes way more sense then my approach, I
guess I just let you work on this. :-)
Here's an updated patch.  It now sets a handler for thread 1.  The test code
has been modified to trigger via an environment variable.  I tried to go with
what appeared to be the standard coding style.	Let me know if I wandered off
the path.  I am working on updates to exceptq that will make the output more
useful for our puposes.

Someone needs to give me a quick lesson on how at add XP_OS2_EXCEPTQ to
configure.  This will save me some code reading time.
Attachment #191559 - Attachment is obsolete: true
The patch looks fine to me, much better than what I had come up with. :-)
In the meantime I realized that the accepted practise seems to be to name
variables in the NSPR component with an NSPR_ prefix, so perhaps you should use
NSPR_OS2_EXCEPTQ. I will take a look on how to add it to configure, unless the
NSPR guys or Mike think that would be overkill?
This changes the configure process so that NSPR_OS2_EXCEPTQ gets defined for
the compiles in nsprpub/pr/src/md/os2.
Steven was working on this, so also assign to him.
Assignee: general → steve53
Where would I download exceptQ?
There are several copies of exceptq floating around.  The version I maintain is at

 http://home.earthlink.net/~steve53/os2diags/

Once I get it past what I consider beta, it will uploaded to Hobbes or Netlabs.
QA Contact: general → nspr
Steven, I don't remember what the status of this work was. I take it that you haven't worked on it the last two years. But at least I see a version of ExceptQ dated 2007-09-12 on your webpage. I vaguely remember that some work needed to be done to get ExceptQ to work with the large binaries of Mozilla packages, but I might confuse that. Is that Sept. 2007 version good enough for that, do you think it's still worthwhile to update the patch here?
Hi Peter,

As I recall, the patch was working within its limitations.  I've been doing work on ExceptQ on and off.  There's an even newer version at

http://home.earthlink.net/~steve53/mr2i/beta/exceptq_20080412.zip

I think you will find the revised output formatting will make analysis more efficient.

Michal and I did some work to resolve the large .sym file issues, but this got pushed down on the stack by lack of demand.  We had some ideas that might work, but nothing ever get implemented.  My current workaround is to trim the .map files so that mapsym will generate valid .sym files.  The trimming is a combination of shortening the C++ function names and deleting names that are unlikely to be used.

Another option if you are still using the IBM linker is the build with debug data.  The debug data can be stripped off and shipped as .dbg files.  Exceptq supports the NB04 format that the IBM linker generates.

I've given some though to supporting other debug formats, but nothing has been implemented.

Just a note to say there is an ExceptQ update at

http://home.earthlink.net/~steve53/os2diags/exceptq_6_7_dll_2008-05-24.zip and

http://home.earthlink.net/~steve53/os2diags/exceptq_6_7_src_2008-05-24.zip

This update adds support for oversized symbol files which will make it much more useful with the large maps generated for the Mozilla applications.

I will look at integrating this feature into the OpenWatcom mapsym dip support if there is a need.
Steven, does the Watcom linker work better for this?  It, at least the one that Knut built, works fine for Mozilla apps and is what I have been using for some time now.  I even tried WRC at one point but it wasn't compatible enough with RC to make the change over.  
Hi Andy,

It's not a linker issues.  It's the antique .sym file design which uses 16-bit offsets in the index tables.  If a block of data grows beyond 64KiB, none of the IBM supplied tools will handle the .sym file content correctly because the offsets will be truncated.  Exceptq implements simple heuristics to derive the correct 32-bit offsets.
It's only been five years since this bug was filed but it may happen yet.

The first two patches add code to NSPR and nsAppRunner to support Exceptq v7.0 which I've just released.  You can get the necessary dlls here:
  http://e-vertise.com/misc/exceptq70.zip

Because the copyright on Exceptq is pretty muddy, I don't think we'll be able to include it in the official Mozilla distros.  Instead, we'll have to ask users to download it from hobbes if they wish.  Note that the code I've added creates no dependencies on these dlls.  If they're present, they get loaded.  If not, then no crash reports from that machine.

The third patch is an optional debugging & test patch.  It reports as Exceptq is loaded and unloaded for each thread.  It also gives you the opportunity to test it out.  Add this to the environment, "SET EXCEPTQ_TEST=", then set it equal to either a number or the letter 'X'.  When set to a number, the app will crash shortly after the TID with that number is created.  When set to 'X', you can cause a crash by using a mouse chord (i.e. MB1+MB2) or by playing 1mb of audio through libsidneyaudio (just play an Ogg video).  Setting it to zero disables the auto-crash ('cause there's no TID 0).

Scripts to create .sym files for the moz-apps will be posted in the next day or two.
Assignee: steve53 → dragtext
Status: NEW → ASSIGNED
Hardware: Other → x86
Attached patch add exceptq support to NSPR (obsolete) — Splinter Review
Attachment #193020 - Attachment is obsolete: true
Attachment #193382 - Attachment is obsolete: true
This is really great, and may convince me to start my OS/2 partition again, to finally figure out why SM keeps crashing on me every 5 minutes.

As for shipping, that should not be affected by copyrights. The problem are licenses. Is there no way to track down (the) previous contributor(s), to assign a compatible license (BSD or MPL/LGPL)? The only file that has a license in the source ZIP file, is BSD-licensed (distorm.h), that would be OK. I am not sure what the role of unlicensed files is in the US.
(In reply to comment #23)

> As for shipping, that should not be affected by copyrights. The problem are
> licenses.

It was late when I wrote that - I meant license.

> Is there no way to track down (the) previous contributor(s)

This is IBM EWS s/w that was published in DevCon, so it may subject to either of those licenses.  I don't know if the author and contributors have the authority to change the terms.

Get http://e-vertise.com/misc/exceptq70-dev.zip and look at exceptq.c (in src.zip) for a complete list of everyone who ever messed with it.

Also in the "developer's edition" there's a sample trap report you can examine (the result of using a mouse-chord in Firefox).
EWS is a mess. Some tools have a BSD-like license but with a special IBM paragraph in it, some don't list anything. Didn't find a pre-Steven exceptq package anywhere to check what was in there. According to exceptq.c 4 ex-IBM employees would have to be tracked down. That sounds like more work than I want to do. ;-) So let's forget about that. (In any case, RWS was a success story despite the extra download, exceptq could be, too.)

I already looked at the TRP file, that info should be very helpful in most cases, much better than searching for POPUPLOG.OS2 addresses through .map files as I have done too many times in the last years (although I am not sure if that is the best example, --I hate event handling code).
Exceptq is not EWS in the sense that I would use the term.  Typical EWS was published in binary form on the EWS site and always included license text, best I can recall.

Exceptq was originally IBM Internal software and labeled as such.  When the Developer Connection article was published, the sources were republished with all the IBM Internal notices removed.  The sources were also included on the Devcon 12 CDs.

The DevCon CD sources include a .pkg file which states

 :nick.EXCEPT      :sec.IBM Internal Use Only        :disk.********            
 ...
 :lic.By placing material on this conference, I agree to grant                
  IBM a non-exclusive, royalty-free license for the material as       
  set forth in the LICENSE AGREEMNT file on this conference.

This file seems to have been omitted by whoever uploaded the files to os2site and elsewhere.

A bit more hunting helped me recall that I have numerous html copies Developer Connection License Agreements.  The Developer Connection Release 2 License Agreement states

(8)  The PROGRAM may contain sample programs that are 
furnished by IBM as examples. These examples have not been thoroughly 
tested under all conditions. IBM, therefore, cannot guarantee or 
imply reliability, serviceability, or function of these sample 
programs. You may copy, modify, and distribute these sample programs 
in any form without payment to IBM, for the purposes of 
developing, using, marketing and distributing application programs 
conforming to the application programming interface for the operating 
platform for which the sample programs are written, provided that: 
1) you agree to defend, hold harmless and indemnify IBM from 
and against any and all claims, liabilities, damages, expenses 
and costs arising out of or in connection with your use or 
distribution of the sample programs, modifications thereof, or your 
application program(s); and 2) you do not state that IBM certifies or 
guarantees the operation of the sample programs or modifications 
thereof, with any hardware and/or software. Each copy or partial 
copy of sample programs or any modifications thereof, must 
include a copyright notice as follows: "(C) Copyright (your company 
name) (year). Portions of this code are derived from IBM Sample 
Programs. (C) Copyright IBM Corp. 1997-1999. All rights reserved."

The only hitch is that exceptq may have been removed from the CDs by the time this license was published.  However, I'm pretty sure it was still available from the Developer Connection website.

It turns out Marc Fiammante still works for IBM, although I don't know if he can   do anything to clarify the licensing.  A while ago I ran across http://fr.linkedin.com/pub/marc-fiammante/2/74a/230
(In reply to comment #19)
> Scripts to create .sym files for the moz-apps will be posted in the next day
> or two.

Here are scripts for Firefox, Seamonkey, and Thunderbird.  They require the use of "Remap" which I've just released.  You can get it from here:
  http://e-vertise.com/misc/remap10.zip

The scripts for FF & SM are improvements over the ones Dave and Walter already have.  Most notable, they list errors at the end of the processing so those messages won't scroll off the screen.  I couldn't test the one for TB since I don't have a build of it, but it's almost identical to SM's script which works nicely.
trivial update to accommodate bitrot
Attachment #462674 - Attachment is obsolete: true
Comment on attachment 462673 [details] [diff] [review]
add exceptq support to NSPR

This RFE has some patches/files that are clearly marked "not for review", and 
others that are not marked at all, regarding review.  It has no patches or 
files that are marked as awaiting review, which is why it's languishing, 
I suspect.  If the author of any of these files/patches wants them to be 
reviewed, he should explicitly mark them to request review.  

Having said that, I will offer some unsolicited review feedback.  Please correct this observation if it is incorrect.  The first patch above appears to want to extend NSPR's Platform-independent API in a way that makes it become platform dependent.  It would add a public API function that is unlike all other PR_* Functions in that it is defined to exist only on one platform.  

As demonstrated by the last patch above, use of this new function requires 
that it be bracketed by platform dependent ifdefs in the caller (outside of NSPR).  In this respect it fails to accomplish the purpose of NSPR, namely to provide a virtual platform API whose functions are all implemented on all NSPR-supported platforms.  NSPR is about eliminating platform-dependent 
ifdefs in the callers/users of NSPR's APIs, not necessitating new ifdefs.  
Ifdefs are supposed to be buried inside NSPR so that NSPR's users need not 
be bothered with them.  

Observe how NSPR implement's WinNT's IO-completion ports in a manner that is totally transparent to the caller, being completely buried in the file/socket 
IO abstraction.   Can OS/2's exceptq feature be similarly buried in some 
NSPR API?
Attachment #462673 - Flags: review-
(In reply to comment #32)

> Can OS/2's exceptq feature be similarly buried in some NSPR API?

That's exactly how this patch is structured - with one exception.  For any thread created by NSPR, Exceptq (an exception reporting facility) is automatically installed if the supporting DLL is present.  The calling code is wholly unaware of this.

> use of this new function requires that it be bracketed by platform dependent
> ifdefs in the caller

While the function may be invoked many times from within NSPR (sans ifdefs), it is invoked externally only once:   in the app's startup routine on the *primary* thread.  Since that thread is not created by NSPR, how else can it use the same facilities as the NSPR-created threads other than calling them directly?

> It would add a public API function that is unlike all other PR_* Functions
> in that it is defined to exist only on one platform.

The new function is patterned after one that has been in place for at least 5 years.  It too is platform-specific but seems not to have caused a stir.

> This RFE [...] has no patches or files that are marked as awaiting review,
> which is why it's languishing

It may have "languished" for 5 years but it is currently under active development.  Since I have the impression that it is extremely difficult to get anything committed to NSPR, I have chosen not to seek review until I'm quite sure that the NSPR component will not need any further revisions.  When the time comes, perhaps you would like to do the honors.
bitrot update
Attachment #462675 - Attachment is obsolete: true
Per comment #32, Exceptq support within NSPR is now completely transparent.  There are no external references to the new code.
Attachment #462673 - Attachment is obsolete: true
In light of comment #32, this duplicates the code previously found only in NSPR and puts it in nsAppRunner as well.  On the bright side, this provides a way to add Exceptq support to modules which create threads but are not NSPR-aware (e.g. libvpx).
Attachment #500150 - Attachment is obsolete: true
Attachment #500563 - Attachment is obsolete: true
Attachment #463837 - Attachment is obsolete: true
Attachment #463838 - Attachment is obsolete: true
Attachment #463839 - Attachment is obsolete: true
The patches for nsAppRunner and NSPR should be the "final" versions.

The scripts to create .xqs files are for OS/2 builders who want to create symbol files in the new XQS format supported by Exceptq v7.1.
Rich, if these patches are "final", why are they not for review?
Comment on attachment 510162 [details] [diff] [review]
add exceptq support to NSPR - v2

(In reply to comment #42)
> Rich, if these patches are "final", why are they not for review?

It was a matter of identifying the right parties to do so.  Since you seem to be soliciting the job, could you do the honors for the NSPR patch?
Attachment #510162 - Flags: review?(nelson)
Attachment #510163 - Flags: review?(wuno)
Comment on attachment 510162 [details] [diff] [review]
add exceptq support to NSPR - v2

some reviewer questions.
1) why is it necessary to do this code in two places, in NSPR and also in naAppRunner?
2) Doesn't this leak the loaded EXCEPTQ module?
On some systems, a loaded library that is not unloaded is counted as a 
memory leak.  If NSPR loads it, then NSPR should also unload it, I think,
when NSPR is shut down.  see PR_Cleanup.  Does this sound right to you?
(In reply to comment #44)
> 1) why is it necessary to do this code in two places, in NSPR and also in
> naAppRunner?

As noted in comment #33, this exception handler has to be installed on the app's primary thread as well as NSPR-created threads.  Since calling the code in NSPR was deemed inappropriate, it became necessary to duplicate it.  Comment #36 identifies a benefit provided by this arrangement.

> 2) Doesn't this leak the loaded EXCEPTQ module?

Not at all.  The dll is linked to the app this way to avoid creating a runtime dependency (it may not be present or may be a back-level version).  Once loaded, it should never, ever be unloaded except by the OS since the OS will call into it as the application shuts down.
Why not have NSPR install it in the primary thread in NSPR initialization?
(In reply to comment #46)
> Why not have NSPR install it in the primary thread in NSPR initialization?

OS/2 uses a chain of structures on the stack to establish the order in which it calls exception handlers.  The most recently installed, i.e. the one highest on the stack, is called first.

Each structure is a local variable belonging to the function that installed the handler.  Consequently, the handler must be removed from the chain before the structure goes out of scope when the function returns.  Failure to do so will generate an unrecoverable exception when the OS tries to use the chain.

This means you can't just call some helper function to install your handler unless you can also pass it a pointer to a structure that will remain in scope for an extended period - preferably, for the life of the thread.

Since I couldn't find any NSPR initialization function that accepts an arbitrary argument that could be used for this purpose, the primary thread requires either new non-NSPR code or external access to platform-specific code within NSPR.
Wan-Teh, I seem to recall that some other OS requires NSPR to so something on 
the process's primordial thread first, but I do not recall which OS it is, 
nor how NSPR ensures that that operation is done on the primordial thread 
when it is first invoked.  Can you refresh my memory about this?
Comment on attachment 510162 [details] [diff] [review]
add exceptq support to NSPR - v2

Nelson: the only such OS I remember is HP-UX 9, back in 1996-1998.
The source code is in the _MD_EarlyInit function for HP-UX:
http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/nsprpub/pr/src/md/unix/hpux.c&rev=3.7&mark=82#79

The new PR_OS2_SetExceptqHandler and PR_OS2_UnsetExceptqHandler
functions added in this patch don't need to be defined with
PR_IMPLEMENT (which causes them to be exported from nspr4.dll)
because these two functions are only called by the static
function ExcpStartFunc.
Depends on: 705454
No longer depends on: 705454
Comment on attachment 510163 [details] [diff] [review]
add exceptq support to nsAppRunner - v3

Review of attachment 510163 [details] [diff] [review]:
-----------------------------------------------------------------

works well since several years
Attachment #510163 - Flags: review?(wuno) → review+
Severity: normal → S3

The bug assignee is inactive on Bugzilla, so the assignee is being reset.

Assignee: dragtext → nobody
Status: ASSIGNED → NEW
Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.