Mork code is not 64 bit clean

RESOLVED FIXED in M16

Status

SeaMonkey
General
P3
normal
RESOLVED FIXED
18 years ago
13 years ago

People

(Reporter: jim_nance, Assigned: Bienvenu)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(4 attachments)

(Reporter)

Description

18 years ago
Running mozilla under Tru64 Unix generates unaligned access messages.
After looking at the code it appears that it makes assumptions that
are not valid on 64 bit machines.  For example, from mork.h:

    typedef unsigned long  mork_u4;  // make sure this is four bytes
    typedef long           mork_i4;  // make sure this is four bytes

Which I think is causing this problem:

Thread stopped on Unaligned Access
stopped at [void InitWeeBookAtom(class morkEnv*, const morkBuf&, class
morkAtomSpace*, mork_aid):495 0x30011042f9c]
    495   mAtom_Change = morkChange_kNil;
(ladebug) w
    490 void
    491 morkWeeBookAtom::InitWeeBookAtom(morkEnv* ev, const morkBuf& inBuf,
    492   morkAtomSpace* ioSpace, mork_aid inAid)
    493 {
    494   mAtom_Kind = 0;
>   495   mAtom_Change = morkChange_kNil;
    496   if ( ioSpace )
    497   {
    498     if ( inAid )
    499     {
(ladebug) p this
0x14017e054
(ladebug) 

Note that this is aligned on a 4 byte boundary, which is not appropriate for
a structure which contains longs, as this one mistakedly does.
(Reporter)

Comment 1

18 years ago
I found some more stuff.  From db/mork/src/morkConfig.h:

/* ===== pooling ===== */

#define MORK_CONFIG_PTR_SIZE_4 1 /* sizeof(void*) == 4 */ 

//define MORK_CONFIG_ALIGN_8 1 /* must have 8 byte alignment */
(Reporter)

Comment 2

18 years ago
Autoconfig should be setting up the stuff that is in the morkConfig.h
header, so I am adding cls to the CC list.
(Reporter)

Comment 3

18 years ago
I changed the typedefs, and the defines to reflect the 64 bit hardware I am
running on, and that did not fix the unaligned access problems.  I just turned
off Arena support in Mork to see if that fixes the problem.

FWIW, this should be a problem under 64 bit AIX, IRIX, Solaris, & HPUX.  What
do those platforms do on unaligned accesses?
this bug belongs to david b, who is owning mork now.
Assignee: asadotzler → bienvenu
(Assignee)

Comment 5

18 years ago
I'm not going to be able to do anything about this without some help from 
someone who knows about these 64 bit platforms.
(Reporter)

Comment 6

18 years ago
wtc found some similar code in mdb.h

(Assignee)

Comment 7

18 years ago
OK, but that doesn't help me :-(
(Reporter)

Comment 8

18 years ago
bienvenu,
    The 64 bit problems are pretty simple.  Pointers and longs on 64 bit
platforms are 8 bytes instead of the usual 4 bytes.  If you look in
mork.h and mdb.h you will see comments where it is assumed that longs are
4 bytes.  The correct way to fix this is probably to change all the types
that assume a length to PRUint32 or PRInt16 or whatever PR type is appropriate.

Another issue is that there is a #define that is set if you have 4 byte
pointers.  This needs to be tied into the autoconf system somehow since
pointer size varies from platform to platform.

Finally there is the issue of alignment.  The address of a 64 bit
quantity like a pointer or a long must be divisible by 8.  If you use
new or malloc to obtain memory for your structures, this happens automatically.
Mork uses its own memory allocator, and it seems to align memory so that
it is divisable by 4 but not always by 8.  I dont know where this code
is and someone may need to track it down.  I have a report from wtc that
the unaligned access messages go away after fixing mdb.h, so that may have
fixed this problem.
(Assignee)

Comment 9

18 years ago
using PR types is easy, now that I'm maintaining this code. I'll do that.

who can I ask about the autoconf system?

I know in theory about the sub-allocator used by mork, but I need to know if 
there's anything left to do here (wtc?)
(Reporter)

Comment 10

18 years ago
Chris Seawood is a good autoconf contact.

Comment 11

18 years ago
It'd be cake to add a sizeof(long) == 8 test to configure that would set
USE_64BIT or SIZEOF_LONG_IS_8 or something.  
(Assignee)

Comment 12

18 years ago
that would be very helpful, I think. Why hasn't anyone needed this before?
Status: NEW → ASSIGNED
(Reporter)

Comment 13

18 years ago
It may be possible to do the check with a test like

if(sizeof(void*)==8) {
} else if(sizeof(void*)==4) {
}

statement.  This may be what other code is doing.  I dont know what
mork does with that define, but if you can use something like the
expression above, then you will not need autoconf changes.  Unfortunatly
you can not use sizeof() expressions in #if statements, so you would
have to change the actual code.
(Assignee)

Comment 14

18 years ago
i'd rather not do it at runtime...mork is all set up for using a #define

Comment 15

18 years ago
Unfortunately, you're doing a runtime check either way.  The test in configure
has to be a runtime check.  Unfortunately, doing it in configure means that
cross-compiling will probably use the wrong value.  When cross-compiling,
configure will error on the side of caution and assume the runtime check would
have failed.

Comment 16

18 years ago
Created attachment 7357 [details] [diff] [review]
Set HAVE_64BIT_OS if runtime check for sizeof(long)==8 is true
As far as I know, although sizeof (variablename) is a runtime operator, I
believe that sizeof (typename) is actually a compile-time operator (and can even
generate compile-time errors).  I don't have the K&R book nearby to check.
And one consequence of that is that there shouldn't be any cross-compilation
problems relating to such a test.
(Reporter)

Comment 19

18 years ago
I did a clean build with the types in mork.h and mdb.h changed from long to
int, and with the defines in morkConfig.h set for 8 byte alignment 8 byte
pointers.  The unaligned access messages are gone.

Both wtc and I were having some problems last night wich SIGSEGVs, but this
seems to have been fixed by the rebuild.  This is not terriably supprising as
the alpha build is not using dependencies, so its easy to get it out of whack.

Are there any plans to fix any of these problems for M15?  Somehow I doubt it,
but it would be nice to at least change the variables that are supposed to be
4 bytes from longs to PRInt32s in mork.h and mdb.h.  This should be a fairly
safe change.
(Assignee)

Comment 20

18 years ago
No, I was going to check it in for m16 - I had no idea anyone wanted it for m15.
Target Milestone: --- → M16
(Reporter)

Comment 21

18 years ago
I was wrong about things working with different types and defines, the
SIGSEGV is still there.  I am debugging it now.
(Reporter)

Comment 22

18 years ago
I have attached a patch that finishes fixing this problem.  The SIGSEGV
I was getting does not look like it was related to these changes.  I
tested this patch under Tru64 5.0.
(Reporter)

Comment 23

18 years ago
Created attachment 7622 [details] [diff] [review]
Patch to fix.  Includes Chrises prior patch
(Reporter)

Comment 24

18 years ago
I forgot to mention that this patch includes the patch that chris did
earlier for the autocon changes.

This is in my tree.  I will check it in if someone gives me permission.
(Reporter)

Comment 25

18 years ago
Created attachment 7623 [details] [diff] [review]
A corrected fix
(Reporter)

Comment 26

18 years ago
Dauphin noticed that that last patch was wrong.  This one should be
better.

Comment 27

18 years ago
r=cls
(Assignee)

Comment 28

18 years ago
the mork part looks good.

I already checked in the use of PRInt32 to mork.h and mdb.h

- david
(Reporter)

Comment 29

18 years ago
I checked in the patch
Status: ASSIGNED → RESOLVED
Last Resolved: 18 years ago
Resolution: --- → FIXED
(Reporter)

Comment 30

18 years ago
OK, I was wrong.  There is a problem with crashes.  Its just not completely
reproducable, which is why I thought it was gone.  Here is what I know
so far.  We get a SEGV at:

stopped at [morkRowObject* morkRow::AcquireRowObject(class morkEnv*, class
morkStore*):250 0x3000f856b20]
    250   morkRowObject* ro = mRow_Object;

If I go up 1 frame I see:

>1  0x3000f856c14 in ((morkRow*)0x4017c1b0)->AcquireRowHandle(ev=0x14025ec00,
ioStore=0x1401fd800) "/house/jnance/mbuild/mozilla/db/mork/src/morkRow.cpp":271
    271   morkRowObject* object = this->AcquireRowObject(ev, ioStore);

The "this" pointer, is a 32 bit number, which wrong.  I assume that somewhere
someone has truncated the top 32 bits by casting to/from an integer.  I am
having some problems with the debugger, so I can not go up any further,
but I did put in some code and this is the second call to AcquireRowHandle().

I think what is happening is that the memory management code for mork
is returning this pointer after it has recycled it.  I dont know the
code very well.  Can anyone shed any light on how this works?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(Assignee)

Comment 31

18 years ago
David Mc who wrote this code has left the company - I know how to read the code
but I can't really tell you how things work more than you already know, I'm
afraid. It sounds like something going on in mork pools. Perhaps the handles are
truncating the pointers. You can try turning off the pools in morkConfig.h by
commenting out the appropriate #define.
(Reporter)

Comment 32

18 years ago
OK, I made a little more progress.  Here is the code that extracts the
recycled row element pointer:

/*| get_assoc: read the key and/or value at index inPos
|*/
void
morkMap::get_assoc(void* outKey, void* outVal, mork_pos inPos) const
{
  mork_num valSize = this->FormValSize();
  if ( valSize && outVal ) /* map holds values? caller wants the value? */
  {

valSize is 4, which is wrong.  I assume this value was set by the person
who saved it, but I dont know where that was yet.  My stack frame looks
like this:

>0  0x3000f84d6dc in ((morkMap*)0x14047cb00)->get_assoc(outKey=0x0,
outVal=0x11fffaf38, inPos=0)
"/house/jnance/mbuild/mozilla/db/mork/src/morkMap.cpp":268
    268   if ( valSize && outVal ) /* map holds values? caller wants the value? 
>1  0x3000f84e614 in ((morkMap*)0x14047cb00)->Get(ev=0x14025ec00,
inKey=0x11fffaf40, outKey=0x0, outVal=0x11fffaf38, outChange=0x0)
"/house/jnance/mbuild/mozilla/db/mork/src/morkMap.cpp":683
    683       this->get_assoc(outKey, outVal, i);
#3  0x3000f859e4c in ((morkAtomRowMap*)0x14047cb00)->GetAid(ev=0x14025ec00,
inAid=129)
"/house/jnance/mbuild/mozilla/db/mork/src/morkAtomMap.h~alt~deccxx_CDDAEC6C":334
#4  0x3000f85ae60 in ((morkRowSpace*)0x140243f80)->FindRow(ev=0x14025ec00,
inCol=130, inYarn=0x11fffb0d8)
"/house/jnance/mbuild/mozilla/db/mork/src/morkRowSpace.cpp":554
#5  0x3000f85ff00 in ((morkStore*)0x1401fd800)->FindRow(ev=0x14025ec00,
inScope=128, inColumn=130, inYarn=0x11fffb0d8)
"/house/jnance/mbuild/mozilla/db/mork/src/morkStore.cpp":1204
#6  0x3000f83c104 in ((orkinStore*)0x14025e800)->FindRow(mev=0x14025ed48,
inRowScope=128, inColumn=130, inTargetCellValue=0x11fffb0d8,
outRowOid=0x11fffb0d0, acqRow=0x11fffb0c8)
"/house/jnance/mbuild/mozilla/db/mork/src/orkinStore.cpp":685
#7  0x3000f009dd4 in
((nsGlobalHistory*)0x1402a5fa0)->SetPageTitle(aURL=0x1409edd10="file:///house/jnance/atom.html",
aTitle=0x1409ee080)
"/house/jnance/mbuild/mozilla/xpfe/components/history/src/nsGlobalHistory.cpp":664
#8  0x3000d01d0e0 in ((nsDocShell*)0x140985300)->SetTitle(aTitle=0x1409ee080)
"/house/jnance/mbuild/mozilla/docshell/base/nsDocShell.cpp":1633
#9  0x30012bc6fb0 in ((nsHTMLDocument*)0x140a38800)->SetTitle(aTitle=const class
{ ... })
"/house/jnance/mbuild/mozilla/layout/html/document/src/nsHTMLDocument.cpp":792
(Reporter)

Comment 33

18 years ago
I found the problem, and I have a fix.  I am not exactly sure its the
right fix, but it works.  The problem is we are storing pointers in a
hash table, and when we created the hash table, we told it the size of
the stuff we were putting in it was 4, which is wrong on 64 bit OSs.

This fix makes things run:

Index: morkAtomMap.cpp
===================================================================
RCS file: /cvsroot/mozilla/db/mork/src/morkAtomMap.cpp,v
retrieving revision 1.4
diff -u -r1.4 morkAtomMap.cpp
--- morkAtomMap.cpp     1999/11/06 03:17:06     1.4
+++ morkAtomMap.cpp     2000/04/21 20:41:55
@@ -423,7 +423,7 @@
 
 morkAtomRowMap::morkAtomRowMap(morkEnv* ev, const morkUsage& inUsage,
   nsIMdbHeap* ioHeap, nsIMdbHeap* ioSlotHeap, mork_column inIndexColumn)
-  : morkIntMap(ev, inUsage, sizeof(mork_aid), ioHeap, ioSlotHeap,
+  : morkIntMap(ev, inUsage, sizeof(mork_ip), ioHeap, ioSlotHeap,
     /*inHoldChanges*/ morkBool_kFalse)
 , mAtomRowMap_IndexColumn( inIndexColumn )
 {

I am not sure if the correct place to fix this is here, or down in
morkIntMap, but the result is going to be the same either way.
Comments?
(Reporter)

Comment 34

18 years ago
Created attachment 7824 [details] [diff] [review]
Patch to fix problem
(Assignee)

Comment 35

18 years ago
I'm not sure either - I suspect your way is better. Actually, I think your way 
is right - the parameter you changed is the size of the value to the map, and 
the values are pointers. Looks good.

Comment 36

18 years ago
I found that there's a morkPointerMap class.
Do you think we should use that instead of
morkIntMap?

However, there is a comment regarding morkPointerMap
that seems wrong:
    // keySize for morkPointerMap equals sizeof(mork_u4)

Another idea is to define mork_aid as mork_ip as opposed
to mork_token.
(Assignee)

Comment 37

18 years ago
I think mork_aid is an atom id, which probably is a 32 bit integer id. It looks 
like pointerMap is wrong, like you say, and may have different performance 
characteristics. If what you've done works, that seems good to me.

Comment 38

18 years ago
Please ignore my previous comments.  I didn't understand
the code.

I read the relevant mork code several times and now I finally
get it.  The value we store in morkAtomRowMap is (morkRow*),
so the morkAtomRowMap constructor should pass an inValSize
of sizeof(mork_ip) (== sizeof(morkRow*)) to the base class
morkIntMap constructor.  mork_aid is the key.  Since mork_aid
is defined as mork_token, which is defined as mork_u4, it is
correct to base morkAtomRowMap on morkIntMap, which uses
mork_u4 as the key.

Jim Nance's patch (id=7824) is correct. r=wtc@netscape.com.
(Assignee)

Comment 39

18 years ago
isn't this fixed now? Does whoever fixed it want to take it and mark it fixed?

Comment 40

18 years ago
I verified that the latest patch (id=7824) has
been checked in.  Marked the bug fixed.
Status: REOPENED → RESOLVED
Last Resolved: 18 years ago18 years ago
Resolution: --- → FIXED

Comment 41

18 years ago
Sorry for the spam.  New QA Contact for Browser General.  Thanks for your help
Joseph (good luck with the new job) and welcome aboard Doron Rosenberg
QA Contact: jelwell → doronr

Comment 42

18 years ago
can one of you please Verify that this is FIxed.  Neither Doron nor I are going 
to be able to verify this.  Thanks.
Product: Browser → Seamonkey
You need to log in before you can comment on or make changes to this bug.