Closed Bug 34949 Opened 24 years ago Closed 24 years ago

Mork code is not 64 bit clean

Categories

(SeaMonkey :: General, defect, P3)

DEC
OSF/1
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jim_nance, Assigned: Bienvenu)

Details

Attachments

(4 files)

Running mozilla under Tru64 Unix generates unaligned access messages.
After looking at the code it appears that it makes assumptions that
are not valid on 64 bit machines.  For example, from mork.h:

    typedef unsigned long  mork_u4;  // make sure this is four bytes
    typedef long           mork_i4;  // make sure this is four bytes

Which I think is causing this problem:

Thread stopped on Unaligned Access
stopped at [void InitWeeBookAtom(class morkEnv*, const morkBuf&, class
morkAtomSpace*, mork_aid):495 0x30011042f9c]
    495   mAtom_Change = morkChange_kNil;
(ladebug) w
    490 void
    491 morkWeeBookAtom::InitWeeBookAtom(morkEnv* ev, const morkBuf& inBuf,
    492   morkAtomSpace* ioSpace, mork_aid inAid)
    493 {
    494   mAtom_Kind = 0;
>   495   mAtom_Change = morkChange_kNil;
    496   if ( ioSpace )
    497   {
    498     if ( inAid )
    499     {
(ladebug) p this
0x14017e054
(ladebug) 

Note that this is aligned on a 4 byte boundary, which is not appropriate for
a structure which contains longs, as this one mistakedly does.
I found some more stuff.  From db/mork/src/morkConfig.h:

/* ===== pooling ===== */

#define MORK_CONFIG_PTR_SIZE_4 1 /* sizeof(void*) == 4 */ 

//define MORK_CONFIG_ALIGN_8 1 /* must have 8 byte alignment */
Autoconfig should be setting up the stuff that is in the morkConfig.h
header, so I am adding cls to the CC list.
I changed the typedefs, and the defines to reflect the 64 bit hardware I am
running on, and that did not fix the unaligned access problems.  I just turned
off Arena support in Mork to see if that fixes the problem.

FWIW, this should be a problem under 64 bit AIX, IRIX, Solaris, & HPUX.  What
do those platforms do on unaligned accesses?
this bug belongs to david b, who is owning mork now.
Assignee: asadotzler → bienvenu
I'm not going to be able to do anything about this without some help from 
someone who knows about these 64 bit platforms.
wtc found some similar code in mdb.h

OK, but that doesn't help me :-(
bienvenu,
    The 64 bit problems are pretty simple.  Pointers and longs on 64 bit
platforms are 8 bytes instead of the usual 4 bytes.  If you look in
mork.h and mdb.h you will see comments where it is assumed that longs are
4 bytes.  The correct way to fix this is probably to change all the types
that assume a length to PRUint32 or PRInt16 or whatever PR type is appropriate.

Another issue is that there is a #define that is set if you have 4 byte
pointers.  This needs to be tied into the autoconf system somehow since
pointer size varies from platform to platform.

Finally there is the issue of alignment.  The address of a 64 bit
quantity like a pointer or a long must be divisible by 8.  If you use
new or malloc to obtain memory for your structures, this happens automatically.
Mork uses its own memory allocator, and it seems to align memory so that
it is divisable by 4 but not always by 8.  I dont know where this code
is and someone may need to track it down.  I have a report from wtc that
the unaligned access messages go away after fixing mdb.h, so that may have
fixed this problem.
using PR types is easy, now that I'm maintaining this code. I'll do that.

who can I ask about the autoconf system?

I know in theory about the sub-allocator used by mork, but I need to know if 
there's anything left to do here (wtc?)
Chris Seawood is a good autoconf contact.
It'd be cake to add a sizeof(long) == 8 test to configure that would set
USE_64BIT or SIZEOF_LONG_IS_8 or something.  
that would be very helpful, I think. Why hasn't anyone needed this before?
Status: NEW → ASSIGNED
It may be possible to do the check with a test like

if(sizeof(void*)==8) {
} else if(sizeof(void*)==4) {
}

statement.  This may be what other code is doing.  I dont know what
mork does with that define, but if you can use something like the
expression above, then you will not need autoconf changes.  Unfortunatly
you can not use sizeof() expressions in #if statements, so you would
have to change the actual code.
i'd rather not do it at runtime...mork is all set up for using a #define
Unfortunately, you're doing a runtime check either way.  The test in configure
has to be a runtime check.  Unfortunately, doing it in configure means that
cross-compiling will probably use the wrong value.  When cross-compiling,
configure will error on the side of caution and assume the runtime check would
have failed.
As far as I know, although sizeof (variablename) is a runtime operator, I
believe that sizeof (typename) is actually a compile-time operator (and can even
generate compile-time errors).  I don't have the K&R book nearby to check.
And one consequence of that is that there shouldn't be any cross-compilation
problems relating to such a test.
I did a clean build with the types in mork.h and mdb.h changed from long to
int, and with the defines in morkConfig.h set for 8 byte alignment 8 byte
pointers.  The unaligned access messages are gone.

Both wtc and I were having some problems last night wich SIGSEGVs, but this
seems to have been fixed by the rebuild.  This is not terriably supprising as
the alpha build is not using dependencies, so its easy to get it out of whack.

Are there any plans to fix any of these problems for M15?  Somehow I doubt it,
but it would be nice to at least change the variables that are supposed to be
4 bytes from longs to PRInt32s in mork.h and mdb.h.  This should be a fairly
safe change.
No, I was going to check it in for m16 - I had no idea anyone wanted it for m15.
Target Milestone: --- → M16
I was wrong about things working with different types and defines, the
SIGSEGV is still there.  I am debugging it now.
I have attached a patch that finishes fixing this problem.  The SIGSEGV
I was getting does not look like it was related to these changes.  I
tested this patch under Tru64 5.0.
I forgot to mention that this patch includes the patch that chris did
earlier for the autocon changes.

This is in my tree.  I will check it in if someone gives me permission.
Attached patch A corrected fixSplinter Review
Dauphin noticed that that last patch was wrong.  This one should be
better.
r=cls
the mork part looks good.

I already checked in the use of PRInt32 to mork.h and mdb.h

- david
I checked in the patch
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
OK, I was wrong.  There is a problem with crashes.  Its just not completely
reproducable, which is why I thought it was gone.  Here is what I know
so far.  We get a SEGV at:

stopped at [morkRowObject* morkRow::AcquireRowObject(class morkEnv*, class
morkStore*):250 0x3000f856b20]
    250   morkRowObject* ro = mRow_Object;

If I go up 1 frame I see:

>1  0x3000f856c14 in ((morkRow*)0x4017c1b0)->AcquireRowHandle(ev=0x14025ec00,
ioStore=0x1401fd800) "/house/jnance/mbuild/mozilla/db/mork/src/morkRow.cpp":271
    271   morkRowObject* object = this->AcquireRowObject(ev, ioStore);

The "this" pointer, is a 32 bit number, which wrong.  I assume that somewhere
someone has truncated the top 32 bits by casting to/from an integer.  I am
having some problems with the debugger, so I can not go up any further,
but I did put in some code and this is the second call to AcquireRowHandle().

I think what is happening is that the memory management code for mork
is returning this pointer after it has recycled it.  I dont know the
code very well.  Can anyone shed any light on how this works?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
David Mc who wrote this code has left the company - I know how to read the code
but I can't really tell you how things work more than you already know, I'm
afraid. It sounds like something going on in mork pools. Perhaps the handles are
truncating the pointers. You can try turning off the pools in morkConfig.h by
commenting out the appropriate #define.
OK, I made a little more progress.  Here is the code that extracts the
recycled row element pointer:

/*| get_assoc: read the key and/or value at index inPos
|*/
void
morkMap::get_assoc(void* outKey, void* outVal, mork_pos inPos) const
{
  mork_num valSize = this->FormValSize();
  if ( valSize && outVal ) /* map holds values? caller wants the value? */
  {

valSize is 4, which is wrong.  I assume this value was set by the person
who saved it, but I dont know where that was yet.  My stack frame looks
like this:

>0  0x3000f84d6dc in ((morkMap*)0x14047cb00)->get_assoc(outKey=0x0,
outVal=0x11fffaf38, inPos=0)
"/house/jnance/mbuild/mozilla/db/mork/src/morkMap.cpp":268
    268   if ( valSize && outVal ) /* map holds values? caller wants the value? 
>1  0x3000f84e614 in ((morkMap*)0x14047cb00)->Get(ev=0x14025ec00,
inKey=0x11fffaf40, outKey=0x0, outVal=0x11fffaf38, outChange=0x0)
"/house/jnance/mbuild/mozilla/db/mork/src/morkMap.cpp":683
    683       this->get_assoc(outKey, outVal, i);
#3  0x3000f859e4c in ((morkAtomRowMap*)0x14047cb00)->GetAid(ev=0x14025ec00,
inAid=129)
"/house/jnance/mbuild/mozilla/db/mork/src/morkAtomMap.h~alt~deccxx_CDDAEC6C":334
#4  0x3000f85ae60 in ((morkRowSpace*)0x140243f80)->FindRow(ev=0x14025ec00,
inCol=130, inYarn=0x11fffb0d8)
"/house/jnance/mbuild/mozilla/db/mork/src/morkRowSpace.cpp":554
#5  0x3000f85ff00 in ((morkStore*)0x1401fd800)->FindRow(ev=0x14025ec00,
inScope=128, inColumn=130, inYarn=0x11fffb0d8)
"/house/jnance/mbuild/mozilla/db/mork/src/morkStore.cpp":1204
#6  0x3000f83c104 in ((orkinStore*)0x14025e800)->FindRow(mev=0x14025ed48,
inRowScope=128, inColumn=130, inTargetCellValue=0x11fffb0d8,
outRowOid=0x11fffb0d0, acqRow=0x11fffb0c8)
"/house/jnance/mbuild/mozilla/db/mork/src/orkinStore.cpp":685
#7  0x3000f009dd4 in
((nsGlobalHistory*)0x1402a5fa0)->SetPageTitle(aURL=0x1409edd10="file:///house/jnance/atom.html",
aTitle=0x1409ee080)
"/house/jnance/mbuild/mozilla/xpfe/components/history/src/nsGlobalHistory.cpp":664
#8  0x3000d01d0e0 in ((nsDocShell*)0x140985300)->SetTitle(aTitle=0x1409ee080)
"/house/jnance/mbuild/mozilla/docshell/base/nsDocShell.cpp":1633
#9  0x30012bc6fb0 in ((nsHTMLDocument*)0x140a38800)->SetTitle(aTitle=const class
{ ... })
"/house/jnance/mbuild/mozilla/layout/html/document/src/nsHTMLDocument.cpp":792
I found the problem, and I have a fix.  I am not exactly sure its the
right fix, but it works.  The problem is we are storing pointers in a
hash table, and when we created the hash table, we told it the size of
the stuff we were putting in it was 4, which is wrong on 64 bit OSs.

This fix makes things run:

Index: morkAtomMap.cpp
===================================================================
RCS file: /cvsroot/mozilla/db/mork/src/morkAtomMap.cpp,v
retrieving revision 1.4
diff -u -r1.4 morkAtomMap.cpp
--- morkAtomMap.cpp     1999/11/06 03:17:06     1.4
+++ morkAtomMap.cpp     2000/04/21 20:41:55
@@ -423,7 +423,7 @@
 
 morkAtomRowMap::morkAtomRowMap(morkEnv* ev, const morkUsage& inUsage,
   nsIMdbHeap* ioHeap, nsIMdbHeap* ioSlotHeap, mork_column inIndexColumn)
-  : morkIntMap(ev, inUsage, sizeof(mork_aid), ioHeap, ioSlotHeap,
+  : morkIntMap(ev, inUsage, sizeof(mork_ip), ioHeap, ioSlotHeap,
     /*inHoldChanges*/ morkBool_kFalse)
 , mAtomRowMap_IndexColumn( inIndexColumn )
 {

I am not sure if the correct place to fix this is here, or down in
morkIntMap, but the result is going to be the same either way.
Comments?
I'm not sure either - I suspect your way is better. Actually, I think your way 
is right - the parameter you changed is the size of the value to the map, and 
the values are pointers. Looks good.
I found that there's a morkPointerMap class.
Do you think we should use that instead of
morkIntMap?

However, there is a comment regarding morkPointerMap
that seems wrong:
    // keySize for morkPointerMap equals sizeof(mork_u4)

Another idea is to define mork_aid as mork_ip as opposed
to mork_token.
I think mork_aid is an atom id, which probably is a 32 bit integer id. It looks 
like pointerMap is wrong, like you say, and may have different performance 
characteristics. If what you've done works, that seems good to me.
Please ignore my previous comments.  I didn't understand
the code.

I read the relevant mork code several times and now I finally
get it.  The value we store in morkAtomRowMap is (morkRow*),
so the morkAtomRowMap constructor should pass an inValSize
of sizeof(mork_ip) (== sizeof(morkRow*)) to the base class
morkIntMap constructor.  mork_aid is the key.  Since mork_aid
is defined as mork_token, which is defined as mork_u4, it is
correct to base morkAtomRowMap on morkIntMap, which uses
mork_u4 as the key.

Jim Nance's patch (id=7824) is correct. r=wtc@netscape.com.
isn't this fixed now? Does whoever fixed it want to take it and mark it fixed?
I verified that the latest patch (id=7824) has
been checked in.  Marked the bug fixed.
Status: REOPENED → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → FIXED
Sorry for the spam.  New QA Contact for Browser General.  Thanks for your help
Joseph (good luck with the new job) and welcome aboard Doron Rosenberg
QA Contact: jelwell → doronr
can one of you please Verify that this is FIxed.  Neither Doron nor I are going 
to be able to verify this.  Thanks.
Product: Browser → Seamonkey
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: