nsCompressedCharMap crashes during startup on 64bit Solaris.

RESOLVED FIXED in mozilla0.9.6

Status

()

--
critical
RESOLVED FIXED
17 years ago
7 years ago

People

(Reporter: pavlov, Assigned: bstell)

Tracking

({64bit})

Trunk
mozilla0.9.6
Sun
Solaris
64bit
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 1 obsolete attachment)

(Reporter)

Description

17 years ago
stack trace:
=>[1] nsCompressedCharMap::SetChar(this = 0xffffffff7fff4d10, aChar = 338U),
line 156 in "nsCompressedCharMap.cpp"
  [2] InitGlobals(), line 810 in "nsFontMetricsGTK.cpp"
  [3] nsFontMetricsGTK::Init(this = 0x10050a490, aFont = STRUCT, aLangGroup =
0x100407380, aContext = 0x100441250), line 1035 in "nsFontMetricsGTK.cpp"
  [4] nsFontCache::GetMetricsFor(this = 0x100506bc0, aFont = STRUCT, aLangGroup
= 0x100407380, aMetrics = (nil)), line 568 in "nsDeviceContext.cpp"
  [5] DeviceContextImpl::GetMetricsFor(this = 0x100441250, aFont = STRUCT,
aLangGroup = 0x100407380, aMetrics = (nil)), line 233 in "nsDeviceContext.cpp"
  [6] ComputeLineHeight(aRenderingContext = 0x100508910, aStyleContext =
0x1004ffd18), line 2140 in "nsHTMLReflowState.cpp"
  [7] nsHTMLReflowState::CalcLineHeight(aPresContext = 0x10040c290,
aRenderingContext = 0x100508910, aFrame = 0x1004ffd80), line 2182 in
"nsHTMLReflowState.cpp"
  [8] nsBlockReflowState::nsBlockReflowState(this = 0xffffffff7fffa140,
aReflowState = STRUCT, aPresContext = 0x10040c290, aFrame = 0x1004ffd80,
aMetrics = STRUCT, aBlockMarginRoot = 4194304), line 156 in
"nsBlockReflowState.cpp"
  [9] nsBlockFrame::Reflow(this = 0x1004ffd80, aPresContext = 0x10040c290,
aMetrics = STRUCT, aReflowState = STRUCT, aStatus = 0), line 693 in
"nsBlockFrame.cpp"

I am seeing this on a build on Solaris 8 built with Forte 6U2 with -xarch=v9

Comment 1

17 years ago
In LXR, there's nothing line 156 in the current version of nsCompressedCharMap.cpp
http://lxr.mozilla.org/seamonkey/source/gfx/src/nsCompressedCharMap.cpp

Which version of Mozilla are you using?
(Reporter)

Comment 2

17 years ago
it ends up being line 171 because of the license changes.  there havn't been any
other changes to the file... I will update my tree though.. so my line numbers
will be right.
(Assignee)

Comment 3

17 years ago
Pav: is sheep a 64 bit system?

If not is there a system I can build/debug on?
(Reporter)

Comment 4

17 years ago
yeah, sheep (can be) a 64bit system.

add /opt/64bit/bin at the beginning of PATH and /opt/64bit/lib to the beginning
of LD_LIBRARY_PATH (this is where I installed 64bit glib/gtk/libIDL libraries on
sheep)
then set CC to "cc -xarch=v9" and CXX to "CC -xarch=v9" and ASFLAGS="-xarch=v9"
run configure as you normally would, and build.. when it is done, you'll have a
64bit build.  dbx/workshop work as normal.
Keywords: 64bit
(Assignee)

Comment 5

17 years ago
okay, made the indicated changes and I have started a build
(Assignee)

Comment 6

17 years ago
It seems to be failing to find a 64 bit thread locking routine.

rm -f libmozjs.so
CC -xarch=v9 -I/usr/openwin/include -mt  -DDEBUG -DDEBUG_ -DTRACING -g -G
-Qoption ld -z,muldefs -h libmozjs.so -o libmozjs.so  jsapi.o jsarena.o
jsarray.o jsatom.o jsbool.o jscntxt.o jsdate.o jsdbgapi.o jsdhash.o jsdtoa.o
jsemit.o jsexn.o jsfun.o jsgc.o jshash.o jsinterp.o jslock.o jslog2.o jslong.o
jsmath.o jsnum.o jsobj.o jsopcode.o jsparse.o jsprf.o jsregexp.o jsscan.o
jsscope.o jsscript.o jsstr.o jsutil.o jsxdrapi.o prmjtime.o lock_SunOS.o  
-xildoff    -lm -lposix4 -ldl -lnsl -lsocket -L../../dist/bin
-L/builds/bstell/mozilla/dist/lib -lplds4 -lplc4 -lnspr4 -lpthread -ldl 
-lsocket -ldl -lm    
ld: fatal: file lock_SunOS.o: wrong ELF class: ELFCLASS32
ld: fatal: File processing errors. No output written to libmozjs.so
(Reporter)

Comment 7

17 years ago
is this build on top of another build or a fresh tree?  sun's cache might be
getting confused if this is on top of another build.  I would recommend doing a
'gmake -f client.mk distclean' on the tree.
(Assignee)

Comment 8

17 years ago
I did a "gmake -f client.mk distclean" then a "./configure" before the build
Looks like lock_SunOS.s wasn't built using -xarch=v9.  Can you double check
ASFLAGS in config/autoconf.mk and make sure it was set.  Also, can you check the
compile line in the log to see how lock_SunOS.o was built?

(Assignee)

Comment 10

17 years ago
config/autoconf.mk:

  ASFLAGS         =  -K PIC -L -P -D_ASM -D__STDC__=0
(Assignee)

Comment 11

17 years ago
/usr/ccs/bin/as -o lock_SunOS.o -K PIC -L -P -D_ASM -D__STDC__=0  lock_SunOS.s
Ok, that's the problem.  Re-reading your previous comment, I don't see where
CC/CXX/ASFLAGS were passed into the build.  You need to either add those
settings to your mozconfig or pass them on the ./configure line.

Add:
CC="cc -xarch=v9"
CXX="CC -xarch=v9"
ASFLAGS="-xarch=v9"

to ~/.mozconfig

or

env CC="cc -xarch=v9" CXX="CC -xarch=v9" ASFLAGS="-xarch=v9" ./configure
(Assignee)

Comment 13

17 years ago
okay, I finally have a build.
(Assignee)

Comment 14

17 years ago
okay, I'll set "-g" in CFLAGS and CCFLAGS and see if I get debug symbols

Comment 15

17 years ago
bstell- mark it assign if you agree to work on it. 
(Assignee)

Updated

17 years ago
Status: NEW → ASSIGNED
(Assignee)

Comment 16

17 years ago
Here is the error:

  signal BUS (invalid address alignment) 

Looks like the array needs to be 64 bit aligned.

bstell, sorry I didn't catch this 64-bit impurity in review.  RISCs generally
require natural alignment.  The only way to ensure it is with a union around the
array of PRUint16s.  That'll cost an extra "u." member name and dot operator,
but no big deal.

/be

Updated

17 years ago
Blocks: 101793
Oh, and (of course) round up to a 0 mod 8 byte boundary when allocating from the
map -- is that going to waste too much space?  We have to 0 mod 4 align for
uint32 access, already.

/be
(Assignee)

Comment 19

17 years ago
Created attachment 52160 [details] [diff] [review]
patch; force all to use 16 bit access
(Assignee)

Comment 20

17 years ago
Attachment 52160 [details] [diff] forces the map into 16 bit access. This stops the crash.

A complete fix would probably involve typing the memory arrays (both
stack and heap) to ALU_TYPE and doing casts for all the 16 bit accesses.

At present the 64 bit version runs, the profile manager looks okay, 
but the pages are completely blank. Not even images show. I believe 
this is unrelated but it prevents me from verifying this patch.

(Assignee)

Updated

17 years ago
Target Milestone: --- → mozilla0.9.5
(Assignee)

Comment 21

17 years ago
This close to 0.9.4 branch I'd prefer to get the simplest fix in.
(Assignee)

Comment 22

17 years ago
local files display but remote URLs do not
(Assignee)

Comment 23

17 years ago
failing to display remote URLs is probably a separate bug
(Assignee)

Comment 24

17 years ago
When I click the off-line icon I get this error:

###!!! ASSERTION: Should have thread when shutting down.: 'Not Reached', file
nsSocketTransportService.cpp, line 733
###!!! Break: at file nsSocketTransportService.cpp, line 733
JavaScript error: 
 line 0: uncaught exception: [Exception... "Component returned failure code:
0x80004005 (NS_ERROR_FAILURE) [nsIIOService.offline]"  nsresult: "0x80004005
(NS_ERROR_FAILURE)"  location: "JS frame ::
chrome://communicator/content/utilityOverlay.js :: toggleOfflineStatus :: line
69"  data: no]
(Reporter)

Comment 25

17 years ago
Comment on attachment 52160 [details] [diff] [review]
patch; force all to use 16 bit access

r=pavlov
Attachment #52160 - Flags: review+
Comment on attachment 52160 [details] [diff] [review]
patch; force all to use 16 bit access

Good for 0.9.5, sr=brendan@mozilla.org.

Please leave this bug open so we can look into wider memory accesses for 0.9.6 trunk.
Attachment #52160 - Flags: superreview+

Comment 27

17 years ago
Comment on attachment 52160 [details] [diff] [review]
patch; force all to use 16 bit access

a=asa (on behalf of drivers) for checkin to 0.9.5.
Attachment #52160 - Flags: approval+

Comment 28

17 years ago
did this check into m0.9.5 branch ?
IF so, please move it to m0.9.6 if you want to keep it open.
(Assignee)

Updated

17 years ago
Target Milestone: mozilla0.9.5 → mozilla0.9.6
(Assignee)

Updated

17 years ago
Target Milestone: mozilla0.9.6 → mozilla0.9.7
(Assignee)

Comment 29

17 years ago
from bobbell@zk3.dec.com in bug 108950:

> nsCompressedCharMap::SetChars accesses unaligned memory.  With the default
> settings on Tru64 UNIX, Tru64 UNIX correctly detects this, corrects it, and
> prints a warning message.  However, Tru64 UNIX can also be set to crash with
> this behavior, and it is technically incorrect.
> 
> The problem was discovered using a recent nightly build.  Lines 298 and 299 of
> nsCompressedCharMap.cpp are at fault.  They read:
>     NS_ASSERTION(page[i]==0, "this page should be unused");
>     page[i] = aPage[i];
> 
> page (from my crash dump) is a pointer on a four byte boundary.  This is
> because is an offset into mCCMap, which is an array of 16-bit data types.  
> However, page is a point to ALU_TYPE, which on Tru64 UNIX is a 64-bit data 
> type.  Thus, page is not properly aligned.

(Assignee)

Comment 30

17 years ago
What I do not understand is where the misalignment comes from. (I would 
appreciate anyone pointing out what I am missing or where I am mistaken).

The pages are each 16 shorts (32 bytes) so the page-to-page distance should 
maintain the same ALU boundry alignment of the start of the map.

In the section around line 299 the code that accesses the page does so 
in ALU sized groups so it should maintain the ALU boundry alignment as 
the start of the page.

Doesn't malloc return memory that is aligned to the largest ALU size?
If not how could the code safely alloc space for the largest ALU?

Is the base CCMap address on a 4 byte boundry?

I'm seeing this problem again with a tip v9 build using WS5 .

(/opt/SUNWspro/WS5.0/bin/sparcv9/dbx) where
current thread: t@1
=>[1] nsCompressedCharMap::SetChar(this = 0xffffffff7fff5e80, aChar = 338U),
line 223 in "nsCompressedCharMap.cpp"
  [2] InitGlobals(), line 825 in "nsFontMetricsGTK.cpp"
  [3] nsFontMetricsGTK::Init(this = 0x1004e9ec0, aFont = STRUCT, aLangGroup =
0x100420340, aContext = 0x100441430), line 1050 in "nsFontMetricsGTK.cpp"
  [4] nsFontCache::GetMetricsFor(this = 0x1004e5a50, aFont = STRUCT, aLangGroup
= 0x100420340, aMetrics = (nil)), line 631 in "nsDeviceContext.cpp"
  [5] DeviceContextImpl::GetMetricsFor(this = 0x100441430, aFont = STRUCT,
aLangGroup = 0x100420340, aMetrics = (nil)), line 266 in "nsDeviceContext.cpp"
dbx: warning: can't find file
"/space/home/cls/src/moz/main/obj-opt-ws-64-g/layout/build/nsHTMLReflowState.o"
dbx: warning: see `help pathmap'
  [6] ComputeLineHeight(0x1004e9340, 0x1004e0360, 0xffffffff7fffb104,
0xffffffff74d9ef8c, 0x0, 0xffffffff74d7a808), at 0xffffffff74e1521c
  [7] nsHTMLReflowState::CalcLineHeight(0x1004417b0, 0x1004e9340, 0x1004e03c0,
0x0, 0x0, 0x0), at 0xffffffff74e1550c
dbx: warning: can't find file
"/space/home/cls/src/moz/main/obj-opt-ws-64-g/layout/build/nsBlockReflowState.o"
  [8] nsBlockReflowState::nsBlockReflowState(0xffffffff7fffb038,
0xffffffff7fffb4b8, 0x1004417b0, 0x1004e03c0, 0xffffffff7fffb5c8, 0x400000), at
0xffffffff74d9ef8c
dbx: warning: can't find file
"/space/home/cls/src/moz/main/obj-opt-ws-64-g/layout/build/nsBlockFrame.o"
  [9] nsBlockFrame::Reflow(0xffffffff7fffb5c8, 0x1004417b0, 0xffffffff7fffb5c8,
0xffffffff7fffb4b8, 0xffffffff7fffbc04, 0xffffffff755926c0), at 0xffffffff74d7ebb4
dbx: warning: can't find file
"/space/home/cls/src/moz/main/obj-opt-ws-64-g/layout/build/nsContainerFrame.o"
  [10] nsContainerFrame::ReflowChild(0x100489e90, 0x1004e03c0, 0x1004417b0,
0xffffffff7fffb5c8, 0xffffffff7fffb4b8, 0x0), at 0xffffffff74db3324
dbx: warning: can't find file
"/space/home/cls/src/moz/main/obj-opt-ws-64-g/layout/build/nsHTMLFrame.o"
  [11] CanvasFrame::Reflow(0x0, 0x0, 0xffffffff7fffbc04, 0xffffffff7fffb7f8,
0xffffffff7fffbc04, 0x2), at 0xffffffff74e055bc
dbx: warning: can't find file
"/space/home/cls/src/moz/main/obj-opt-ws-64-g/layout/build/nsBoxToBlockAdaptor.o"
  [12] nsBoxToBlockAdaptor::Reflow(0x1004e02d0, 0xffffffff7fffc920, 0x1004417b0,
0xffffffff7fffbbc0, 0xffffffff7fffcc98, 0xffffffff7fffbc04), at 0xffffffff75164ebc
  [13] nsBoxToBlockAdaptor::DoLayout(0x0, 0x0, 0x76c, 0x76c, 0x1,
0xffffffff75164588), at 0xffffffff7516435c
dbx: warning: can't find file
"/space/home/cls/src/moz/main/obj-opt-ws-64-g/layout/build/nsBox.o"
  [14] nsBox::Layout(0x1004e02d0, 0xffffffff7fffc920, 0xffffffff7fffbff8, 0x0,
0x0, 0x2), at 0xffffffff751553fc
dbx: warning: can't find file
"/space/home/cls/src/moz/main/obj-opt-ws-64-g/layout/build/nsScrollBoxFrame.o"


Comment 32

17 years ago
*** Bug 108950 has been marked as a duplicate of this bug. ***

Comment 33

17 years ago
In response to Brian Stell's comment #30:
> Doesn't malloc return memory that is aligned to the largest ALU size?
> If not how could the code safely alloc space for the largest ALU?
> Is the base CCMap address on a 4 byte boundry?

malloc() does indeed return memory that is aligned so that it can be used by any
data type (which here would make it 64-bit aligned).  However, here the memory
is not being explicitly malloc'ed.  The definition of the class
nsCompressedCharMap includes:
  protected:
    PRUint16 mUsedLen;   // in PRUint16
    PRUint16 mAllOnesPage;
    PRUint16 mCCMap[CCMAP_MAX_LEN];

Thus, mCCMap is only guaranteed to be aligned for PRUint16 access.

I believe what the Compaq cxx compiler is doing internally is aligning mUsedLen
on a 64-bit boundary (either intentionally or by chance), which puts mCCMap only
four bytes (two PRUint16's) later, which is not on a 64-bit boundary.

From some debug printfs I added:
  mCCMap @ 0x11fff30f4
  page_offset == 0x40
  page ==  0x11fff3174
(Assignee)

Comment 34

17 years ago
thanks for the insight

this I can fix
(Assignee)

Comment 35

17 years ago
Created attachment 57165 [details] [diff] [review]
patch; use a union to make the C++ object align the map on the largest ALU

bobbell: could you try this patch?

thanks
Attachment #52160 - Attachment is obsolete: true
(Assignee)

Comment 36

17 years ago
the patch was made in the gfx directory
(Assignee)

Updated

17 years ago
Target Milestone: mozilla0.9.7 → mozilla0.9.6
Comment on attachment 57165 [details] [diff] [review]
patch; use a union to make the C++ object align the map on the largest ALU

Why not give that ALU_TYPE dummy; member the canonical (and less insulting :-)
name, namely 'align'?

sr=brendan@mozilla.org in any event.

/be
Attachment #57165 - Flags: superreview+
bstell, can you get r= and then mail drivers@mozilla.org for a= to check in for
0.9.6?  Thanks,

/be
(Assignee)

Comment 39

17 years ago
okay, after only 4 hours I have a 64 bit build on sheep and it crashes without
the patch and runs with that patch. (This is so weird: both my linux systems
are still horked from the network upgrade :( so I'm using my Win98 system
to display the Solaris client.)

Comment 40

17 years ago
Comment on attachment 57165 [details] [diff] [review]
patch; use a union to make the C++ object align the map on the largest ALU

I don't see any problem with the patch. r=shanjian
Attachment #57165 - Flags: review+
a=blizzard on behalf of drivers for 0.9.6
Keywords: mozilla0.9.6+
(Assignee)

Comment 42

17 years ago
checked into 0.9.6 branch
Status: ASSIGNED → RESOLVED
Last Resolved: 17 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.