Closed Bug 203137 Opened 21 years ago Closed 21 years ago

RDF problems with RDFContentSinkImpl::OpenObject in some MingW builds

Categories

(Core Graveyard :: RDF, defect)

x86
Windows XP
defect
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: jonwil, Assigned: dmosedale)

References

Details

(Whiteboard: workaround: always use --enable-optimize and never use -march=pentium3)

Attachments

(6 files, 1 obsolete file)

in some cases, building with MingW causes a crash related to
RDFContentSinkImpl::OpenObject.
If you back out the patch for bug 134113 which is labeled "RDF Changes V1.2" and
apply the patch labeled "RDF changes", it works.
mingw port issue... taking
Assignee: rjc → dmose
Just had a quick look at the two versions of the RDF patches, and I see why the
first patch would fix this problem. However, I suspect it will leave other
issues unfixed.

I've been trying to sort out exactly why it crashed with a SIGFAULT type error
which according to Microsoft is due to a variable usage error (whatever that
means). However, if you look at
http://bugzilla.mozilla.org/show_bug.cgi?id=134113#c295 you will see roughly
where I got to (and this also explains the reason why the earlier RDF patch
works. Since posting that comment, I stepped thru the code via gdb, and after a
lot of patience (I was using stepi in case I missed anything), I found that
Mozilla crashed when returning back to rv =
(gRDFContainerUtils->*(info->mMakeFn))(mDataSource, aContainer, nsnull); after
having travelled to various pieces of code. Don't know why it crashed at that
point, and I was hoping that cls' checkin for debug builds would help, but my
machine was shipped before I could try it.

I don't have a build machine at the moment, it's on a ship heading to New
Zealand from Boston, USA, so I can't do much work on this at the moment, unless
I manage to get hold of a better machine than what I'm using now.

I'm adding cls to the cc list due to his vast experience in WIN32 and gcc,
however that will not be an invitation for him to more than a lurker.
Blocks: mingw
Attached patch patchSplinter Review
This patch changes from the newer code back to the older code.
Comment on attachment 122414 [details] [diff] [review]
patch

Do we want to:
1.check this in and leave it at that?
2.check this in now and figure out a fix for the new code later?
or 3.figure out a fix for the new code now?
Attachment #122414 - Flags: review?(rjc)
there is no reason to check in some patch for a crasher until we have a stacktrace
for that crasher. IMHO.
If the patch fixes the problem, and passes all the required Mozilla
tests/smoketests, then it should be checked in. After all, as far as I can tell,
the majority of patches go in without a supporting stacktrace.

Having said that, having a stacktrace helps verify the accuracy of the bug fix.
I'd love to provide some trace info, but my "real" computer is on a container
ship somewhere, on its way to New Zealand from the USA.
OK, this seems to be a GCC optimizer bug of some sort.  If I build with '-O
-march=pentium3' I see this bug.  If I only build with '-O', I don't.
Comment on attachment 122414 [details] [diff] [review]
patch

Removing review request.  We should get GCC fixed instead.
Attachment #122414 - Flags: review?(rjc)
I've attached the diffs between the assembly code generated for
RDFContentSinkImpl::InitContainer without -march=pentium3 (which works), and
with -march=pentium3 (which hangs).
I seem to recall that "march" is automagically set somewhere (rules.mk?).

When you use just "-O", what is arch set to?
Further to this problem, I get the crash error using:
current trunk seamonkey debug build (with no optomization settings set by me at all)
gcc 3.2.2 (mingw special 20030208-1)
this is the command like that gets passed to GCC:
g++ -mno-cygwin -o nsRDFContentSink.o -c -DOSTYPE=\"WINNT5.1\"
-DOSARCH=\"WINNT\"  -I../../../dist/include/xpcom -I../../../dist/include/string
-I../../../dist/include/rdfutil -I../../../dist/include/necko
-I../../../dist/include/layout -I../../../dist/include/content
-I../../../dist/include/htmlparser -I../../../dist/include/js
-I../../../dist/include/unicharutil -I../../../dist/include/rdf
-I../../../dist/include -I../../../dist/include/nspr           -fno-rtti
-fno-exceptions -Wall -Wconversion -Wpointer-arith -Wcast-align
-Woverloaded-virtual -Wsynth -Wno-ctor-dtor-privacy -pedantic -Wno-long-long
-mms-bitfields -pipe  -DDEBUG -D_DEBUG -DDEBUG_Jonathan_Wilson -DTRACING -g  
-DX_DISPLAY_MISSING=1 -DHAVE_SNPRINTF=1 -D_WINDOWS=1 -D_WIN32=1 -DWIN32=1
-DXP_WIN=1 -DXP_WIN32=1 -DHW_THREADS=1 -DWINVER=0x400 -DSTDC_HEADERS=1
-DNEW_H=\<new\> -DWIN32_LEAN_AND_MEAN=1 -DNO_X11=1 -D_X86_=1 -DD_INO=d_ino
-DSTDC_HEADERS=1 -Duid_t=int -Dgid_t=int -DHAVE_CPP_2BYTE_WCHAR_T=1
-DHAVE_DIRENT_H=1 -DHAVE_MEMORY_H=1 -DHAVE_UNISTD_H=1 -DHAVE_MALLOC_H=1
-DHAVE_MMINTRIN_H=1 -DNEW_H=\<new\> -DHAVE_LIBM=1 -DNO_X11=1
-DMMAP_MISSES_WRITES=1 -DHAVE_STRERROR=1 -DHAVE_SNPRINTF=1 -DHAVE_MEMMOVE=1
-DHAVE_RINT=1 -DVA_COPY=va_copy -DHAVE_VA_COPY=1 -DHAVE_IOS_BINARY=1
-DHAVE_CPP_EXPLICIT=1 -DHAVE_CPP_TYPENAME=1
-DHAVE_CPP_MODERN_SPECIALIZE_TEMPLATE_SYNTAX=1
-DHAVE_CPP_PARTIAL_SPECIALIZATION=1 -DHAVE_CPP_EXTERN_INSTANTIATION=1
-DHAVE_CPP_AMBIGUITY_RESOLVING_USING=1 -DHAVE_CPP_NAMESPACE_STD=1
-DHAVE_CPP_UNAMBIGUOUS_STD_NOTEQUAL=1 -DHAVE_CPP_NEW_CASTS=1
-DHAVE_CPP_DYNAMIC_CAST_TO_VOID_PTR=1 -DNEED_CPP_UNUSED_IMPLEMENTATIONS=1
-DMOZ_DEFAULT_TOOLKIT=\"windows\" -DMOZ_ENABLE_COREXFONTS=1 -DOJI=1 -DIBMBIDI=1
-DMOZ_XPINSTALL=1 -DMOZ_JSLOADER=1 -DMOZ_MATHML=1 -DMOZ_SVG=1 -DMOZ_LOGGING=1
-DDETECT_WEBSHELL_LEAKS=1 -DMOZ_USER_DIR=\".mozilla\" -DCPP_THROW_NEW=throw\(\)
-DMOZ_XUL=1 -DMOZ_PROFILESHARING=1 -DMOZ_DLL_SUFFIX=\".dll\" -DJS_THREADSAFE=1
-DNS_PRINT_PREVIEW=1 -DNS_PRINTING=1 -DMOZ_REFLOW_PERF=1 -DMOZ_REFLOW_PERF_DSP=1
-DMOZILLA_VERSION=\"1.4b\"  -D_MOZILLA_CONFIG_H_ -DMOZILLA_CLIENT
./nsRDFContentSink.cpp

my .mozconfig is set to:
ac_add_options --enable-calendar
ac_add_options --enable-crypto
ac_add_options --disable-accessibility
ac_add_options --enable-extensions=all
ac_add_options --enable-svg
ac_add_options --disable-activex
and I have my environment variable set to:
set CC=gcc
set CXX=g++
set AS=as
set LD=ld
set CPP=cpp
set
PATH=c:\moztools\bin;c:\mingw\bin;c:\cygwin\bin;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\COMMAND;C:\WINDOWS\system32\WBEM;c:\WINDOWS;C:\WINDOWS\COMMAND;c:\ssh
set moz_tools=c:\moztools
set HOME=c:\home
set MOZ_INTERNAL_LIBART_LGPL=1
I am using the standard specs file from the gcc build in question.

I am building on a
Intel(R) Pentium(R) 4 CPU 2.40GHz (if that makes any difference)
OK, so this problem happens with a Pentium 4 and an AMD Duron CPU. I get the
crash with an AMD. So, it's not a Intel vs AMD issue, although architectures
within these brands may be a different story.
-mpentium{3,4} are rather fragile, since they were new in 3.2. Does gcc3.3 work?
What about 3.2.3?

I doubt that you'll see much difference over an i686 build. I'd be interested to
know if we do on unix, at least.
jonwil: can you upload the assembly generated for that function?  just do "make
RDFContentSinkImpl.s" in the appropriate directory, and then upload the relevant
portion of that file.
jonwil: actually, I need the assembly for InitContainer, not OpenObject.  Sorry
for not being more specific.
Attached file correct patch
correct patch attatched
OK, interesting: turning off -O makes it fail for me too.  So the only case were
gcc produces the correct code is with -O and without -march=pentium3.
Whiteboard: workaround: always use --enable-optimize and never use -march=pentium3
Best guess on this problem is that its something with GCC.
There is an unofficial build of GCC 3.3 here:
http://www.thisiscool.com/gcc33_mingw.htm
I would try it but I dont have a tree ATM, so we need someone with a MingW build to:
1.see if using this GCC 3.3 build solves the RDF bug
and 2.post the ASM for the problematic function from various builds
FYI,

I've pulled a fresh tree (Trunk not 1.4final) and I've downloaded gcc3.3 as per
comment #13. Building now (although I've already had to fix a problem with the
compiler not finding new.h), so be patient.
Sorry for spam, my earlier message should have said comment #19 (and not comment
#13).
I finally got Mozilla built with the gcc version mentioned in comment #19 (i.e.
v3.3). Same problem happens in RDFContentSinkImpl. I didn't have -O set, so will
try with it set.
Well, with gcc3.3 and with -O (--enable-optimize) my MingW build worked fine.
Without -O it doesn't. So, do we still have a gcc problem, or a code problem
that the optimizer magically fixes.
I'm going to do some more testing with gcc3.3 and the earlier "approved release"
playing with my .mozconfig based on what I found in about:buildconfig.

If you want any ASM listings, send me some instructions, and I'll attach them
for you.
Ignore my comment #22, lack of sleep and WinXP restore points made me think I
was using gcc 3.3 when I really wasn't. That's what happens when you apply a MS
patch that breaks your internet access.

Anyway, I'm now really on gcc3.3, and still having all sorts of bother with
new.h, or specifically <new.h> in line 36 of
source/string/public/nsBufferHandleUtils.h, which actually uses NEW_H which is
defined in source/configure around line 5140 and 7154. I think I have a fix, but
the INCLUDE path is starting to look rather horrible...
Haveing sorted out the new/new.h issue, I'm now hitting a new problem which must
be gcc 3.3 related otherwise the tinderboxes would be red.

My mozilla/config/nsBuildID.h is fine until it gets to :
"define GRE_BUILD_ID "1.5a_0000000000
this should be
#define GRE_BUILD_ID "1.5a_0000000000"
but I can't figure out where this problem is coming from. The file is auto
generated if it doesn't already exist, but the code I've found seems fine. Any
help would be appreciated.
OK, I've now sorted out the GRE_BUILD_ID thing. Apparantly, the latest version
of cygwin Perl has a minor change which requires me to set an Environment
variable PERLIO=crlf, if not set it defaults to stdio, which isn't good (as I
found out).

Now, of course, I've hit another snag.
Info: resolving __ZTV9nsHashKey by linking to __imp___ZTV9nsHashKey (auto-import
)
nsSpecialSystemDirectory.o(.text$_ZN9nsHashKeyC2Ev+0xb): In function `ZNK20Syste
mDirectoriesKey5CloneEv':
c:/mozilla/source/mozilla/xpcom/obsolete/nsSpecialSystemDirectory.cpp: variable
'vtable for nsHashKey' can't be auto-imported. Please read the documentation for
 ld's --enable-auto-import for details.
Creating library file: libxpcom_compat.dll.a
make[3]: *** [xpcom_compat.dll] Error 1


And no, I haven't read the suggested documentation, mainly because I thought
"auto-import" was already set and happening.
I've given up on gcc3.3 from comment #19.

However, I noticed that gcc 3.2.3 is now available from MinGW. Using that
version, I see similar results as with gcc 2.95. Works fine when I set
enable_optimize, and crashes when I don't.
this crash seems to occur with -O2 , and can only be prevent by using -O
(default for --enable-optimize).  i will try to do some diffs for the assembly
and do so more testing to see which specific optimization option(s) seem to
cause the problem to reoccur at -O2.  I have a stacktrace of the crash if anyone
thinks it would be helpful.
Will, if your stacktrace is different from that posted at
http://bugzilla.mozilla.org/show_bug.cgi?id=134113#c287 then please add it to
this bug. In fact, since I had trouble re-finding that trace, maybe you should
add it anyway.
this is really weird.  i can build successfully at -O2 when building without
--disable-debug , but not with it.  my .mozconfig is the same in all other
respects and i am doing totally clean builds.
Attached patch force -O Splinter Review
Whoops.  I meant to add that this is the workaround that I've been using on my
builds for a couple of months now.  I normally build with --enable-optimize=-O2
without a problem when using the patch.
*** Bug 218900 has been marked as a duplicate of this bug. ***
My build seems to have started working, possibly due to applying that patch.
Attached patch patch (obsolete) — Splinter Review
This solves the problem for real for me.  I got the idea from
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11893 , which is what I think we're
hitting here.
Except that patch doesn't build on MSVC.  Working on a fix.
This changes NS_STDCALL_FUNPROTO to take more arguments so that it can be
implemented with typeof() on gcc, and using the more verbose prototype
elsewhere.  I'd really prefer to not have to use typeof()... while it's more
compact, it requires having access to a method declared on the class which is
of the type you're declaring a pointer to.  I can't think of any situations in
our code where you wouldn't have access to such a function, but I think it
makes the declaration look less generic.
Attachment #131408 - Attachment is obsolete: true
Attachment #131417 - Flags: superreview?(dbaron)
Attachment #131417 - Flags: review?(cls)
Attachment #131417 - Flags: superreview?(dbaron) → superreview+
Attachment #131417 - Flags: review?(cls) → review+
checked in.
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
My mingw builds and runs fine, also "fixes" bug 217124.
Status: RESOLVED → VERIFIED
Blocks: 251535
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: