RDF problems with RDFContentSinkImpl::OpenObject in some MingW builds

VERIFIED FIXED

Status

()

Core
RDF
VERIFIED FIXED
15 years ago
14 years ago

People

(Reporter: Jonathan Wilson, Assigned: dmose)

Tracking

(Blocks: 1 bug)

Trunk
x86
Windows XP
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: workaround: always use --enable-optimize and never use -march=pentium3)

Attachments

(6 attachments, 1 obsolete attachment)

(Reporter)

Description

15 years ago
in some cases, building with MingW causes a crash related to
RDFContentSinkImpl::OpenObject.
If you back out the patch for bug 134113 which is labeled "RDF Changes V1.2" and
apply the patch labeled "RDF changes", it works.
(Assignee)

Comment 1

15 years ago
mingw port issue... taking
Assignee: rjc → dmose

Comment 2

15 years ago
Just had a quick look at the two versions of the RDF patches, and I see why the
first patch would fix this problem. However, I suspect it will leave other
issues unfixed.

I've been trying to sort out exactly why it crashed with a SIGFAULT type error
which according to Microsoft is due to a variable usage error (whatever that
means). However, if you look at
http://bugzilla.mozilla.org/show_bug.cgi?id=134113#c295 you will see roughly
where I got to (and this also explains the reason why the earlier RDF patch
works. Since posting that comment, I stepped thru the code via gdb, and after a
lot of patience (I was using stepi in case I missed anything), I found that
Mozilla crashed when returning back to rv =
(gRDFContainerUtils->*(info->mMakeFn))(mDataSource, aContainer, nsnull); after
having travelled to various pieces of code. Don't know why it crashed at that
point, and I was hoping that cls' checkin for debug builds would help, but my
machine was shipped before I could try it.

I don't have a build machine at the moment, it's on a ship heading to New
Zealand from Boston, USA, so I can't do much work on this at the moment, unless
I manage to get hold of a better machine than what I'm using now.

I'm adding cls to the cc list due to his vast experience in WIN32 and gcc,
however that will not be an invitation for him to more than a lurker.
(Reporter)

Updated

15 years ago
Blocks: 203303
(Reporter)

Comment 3

15 years ago
Created attachment 122414 [details] [diff] [review]
patch

This patch changes from the newer code back to the older code.
(Reporter)

Comment 4

15 years ago
Comment on attachment 122414 [details] [diff] [review]
patch

Do we want to:
1.check this in and leave it at that?
2.check this in now and figure out a fix for the new code later?
or 3.figure out a fix for the new code now?
Attachment #122414 - Flags: review?(rjc)

Comment 5

15 years ago
there is no reason to check in some patch for a crasher until we have a stacktrace
for that crasher. IMHO.

Comment 6

15 years ago
If the patch fixes the problem, and passes all the required Mozilla
tests/smoketests, then it should be checked in. After all, as far as I can tell,
the majority of patches go in without a supporting stacktrace.

Having said that, having a stacktrace helps verify the accuracy of the bug fix.
I'd love to provide some trace info, but my "real" computer is on a container
ship somewhere, on its way to New Zealand from the USA.
(Assignee)

Comment 7

15 years ago
OK, this seems to be a GCC optimizer bug of some sort.  If I build with '-O
-march=pentium3' I see this bug.  If I only build with '-O', I don't.
(Assignee)

Comment 8

15 years ago
Comment on attachment 122414 [details] [diff] [review]
patch

Removing review request.  We should get GCC fixed instead.
Attachment #122414 - Flags: review?(rjc)
(Assignee)

Comment 9

15 years ago
Created attachment 122975 [details]
diffs between assembly without and with -march=pentium3

I've attached the diffs between the assembly code generated for
RDFContentSinkImpl::InitContainer without -march=pentium3 (which works), and
with -march=pentium3 (which hangs).

Comment 10

15 years ago
I seem to recall that "march" is automagically set somewhere (rules.mk?).

When you use just "-O", what is arch set to?
(Reporter)

Comment 11

15 years ago
Further to this problem, I get the crash error using:
current trunk seamonkey debug build (with no optomization settings set by me at all)
gcc 3.2.2 (mingw special 20030208-1)
this is the command like that gets passed to GCC:
g++ -mno-cygwin -o nsRDFContentSink.o -c -DOSTYPE=\"WINNT5.1\"
-DOSARCH=\"WINNT\"  -I../../../dist/include/xpcom -I../../../dist/include/string
-I../../../dist/include/rdfutil -I../../../dist/include/necko
-I../../../dist/include/layout -I../../../dist/include/content
-I../../../dist/include/htmlparser -I../../../dist/include/js
-I../../../dist/include/unicharutil -I../../../dist/include/rdf
-I../../../dist/include -I../../../dist/include/nspr           -fno-rtti
-fno-exceptions -Wall -Wconversion -Wpointer-arith -Wcast-align
-Woverloaded-virtual -Wsynth -Wno-ctor-dtor-privacy -pedantic -Wno-long-long
-mms-bitfields -pipe  -DDEBUG -D_DEBUG -DDEBUG_Jonathan_Wilson -DTRACING -g  
-DX_DISPLAY_MISSING=1 -DHAVE_SNPRINTF=1 -D_WINDOWS=1 -D_WIN32=1 -DWIN32=1
-DXP_WIN=1 -DXP_WIN32=1 -DHW_THREADS=1 -DWINVER=0x400 -DSTDC_HEADERS=1
-DNEW_H=\<new\> -DWIN32_LEAN_AND_MEAN=1 -DNO_X11=1 -D_X86_=1 -DD_INO=d_ino
-DSTDC_HEADERS=1 -Duid_t=int -Dgid_t=int -DHAVE_CPP_2BYTE_WCHAR_T=1
-DHAVE_DIRENT_H=1 -DHAVE_MEMORY_H=1 -DHAVE_UNISTD_H=1 -DHAVE_MALLOC_H=1
-DHAVE_MMINTRIN_H=1 -DNEW_H=\<new\> -DHAVE_LIBM=1 -DNO_X11=1
-DMMAP_MISSES_WRITES=1 -DHAVE_STRERROR=1 -DHAVE_SNPRINTF=1 -DHAVE_MEMMOVE=1
-DHAVE_RINT=1 -DVA_COPY=va_copy -DHAVE_VA_COPY=1 -DHAVE_IOS_BINARY=1
-DHAVE_CPP_EXPLICIT=1 -DHAVE_CPP_TYPENAME=1
-DHAVE_CPP_MODERN_SPECIALIZE_TEMPLATE_SYNTAX=1
-DHAVE_CPP_PARTIAL_SPECIALIZATION=1 -DHAVE_CPP_EXTERN_INSTANTIATION=1
-DHAVE_CPP_AMBIGUITY_RESOLVING_USING=1 -DHAVE_CPP_NAMESPACE_STD=1
-DHAVE_CPP_UNAMBIGUOUS_STD_NOTEQUAL=1 -DHAVE_CPP_NEW_CASTS=1
-DHAVE_CPP_DYNAMIC_CAST_TO_VOID_PTR=1 -DNEED_CPP_UNUSED_IMPLEMENTATIONS=1
-DMOZ_DEFAULT_TOOLKIT=\"windows\" -DMOZ_ENABLE_COREXFONTS=1 -DOJI=1 -DIBMBIDI=1
-DMOZ_XPINSTALL=1 -DMOZ_JSLOADER=1 -DMOZ_MATHML=1 -DMOZ_SVG=1 -DMOZ_LOGGING=1
-DDETECT_WEBSHELL_LEAKS=1 -DMOZ_USER_DIR=\".mozilla\" -DCPP_THROW_NEW=throw\(\)
-DMOZ_XUL=1 -DMOZ_PROFILESHARING=1 -DMOZ_DLL_SUFFIX=\".dll\" -DJS_THREADSAFE=1
-DNS_PRINT_PREVIEW=1 -DNS_PRINTING=1 -DMOZ_REFLOW_PERF=1 -DMOZ_REFLOW_PERF_DSP=1
-DMOZILLA_VERSION=\"1.4b\"  -D_MOZILLA_CONFIG_H_ -DMOZILLA_CLIENT
./nsRDFContentSink.cpp

my .mozconfig is set to:
ac_add_options --enable-calendar
ac_add_options --enable-crypto
ac_add_options --disable-accessibility
ac_add_options --enable-extensions=all
ac_add_options --enable-svg
ac_add_options --disable-activex
and I have my environment variable set to:
set CC=gcc
set CXX=g++
set AS=as
set LD=ld
set CPP=cpp
set
PATH=c:\moztools\bin;c:\mingw\bin;c:\cygwin\bin;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\COMMAND;C:\WINDOWS\system32\WBEM;c:\WINDOWS;C:\WINDOWS\COMMAND;c:\ssh
set moz_tools=c:\moztools
set HOME=c:\home
set MOZ_INTERNAL_LIBART_LGPL=1
I am using the standard specs file from the gcc build in question.

I am building on a
Intel(R) Pentium(R) 4 CPU 2.40GHz (if that makes any difference)

Comment 12

15 years ago
OK, so this problem happens with a Pentium 4 and an AMD Duron CPU. I get the
crash with an AMD. So, it's not a Intel vs AMD issue, although architectures
within these brands may be a different story.
-mpentium{3,4} are rather fragile, since they were new in 3.2. Does gcc3.3 work?
What about 3.2.3?

I doubt that you'll see much difference over an i686 build. I'd be interested to
know if we do on unix, at least.
(Assignee)

Comment 14

15 years ago
jonwil: can you upload the assembly generated for that function?  just do "make
RDFContentSinkImpl.s" in the appropriate directory, and then upload the relevant
portion of that file.
(Reporter)

Comment 15

15 years ago
Created attachment 123613 [details]
Assembly listing from my debug builds.
(Assignee)

Comment 16

15 years ago
jonwil: actually, I need the assembly for InitContainer, not OpenObject.  Sorry
for not being more specific.
(Reporter)

Comment 17

15 years ago
Created attachment 123618 [details]
correct patch

correct patch attatched
(Assignee)

Comment 18

15 years ago
OK, interesting: turning off -O makes it fail for me too.  So the only case were
gcc produces the correct code is with -O and without -march=pentium3.
(Assignee)

Updated

15 years ago
Whiteboard: workaround: always use --enable-optimize and never use -march=pentium3
(Reporter)

Comment 19

15 years ago
Best guess on this problem is that its something with GCC.
There is an unofficial build of GCC 3.3 here:
http://www.thisiscool.com/gcc33_mingw.htm
I would try it but I dont have a tree ATM, so we need someone with a MingW build to:
1.see if using this GCC 3.3 build solves the RDF bug
and 2.post the ASM for the problematic function from various builds

Comment 20

15 years ago
FYI,

I've pulled a fresh tree (Trunk not 1.4final) and I've downloaded gcc3.3 as per
comment #13. Building now (although I've already had to fix a problem with the
compiler not finding new.h), so be patient.

Comment 21

15 years ago
Sorry for spam, my earlier message should have said comment #19 (and not comment
#13).

Comment 22

15 years ago
I finally got Mozilla built with the gcc version mentioned in comment #19 (i.e.
v3.3). Same problem happens in RDFContentSinkImpl. I didn't have -O set, so will
try with it set.

Comment 23

15 years ago
Well, with gcc3.3 and with -O (--enable-optimize) my MingW build worked fine.
Without -O it doesn't. So, do we still have a gcc problem, or a code problem
that the optimizer magically fixes.

Comment 24

15 years ago
I'm going to do some more testing with gcc3.3 and the earlier "approved release"
playing with my .mozconfig based on what I found in about:buildconfig.

If you want any ASM listings, send me some instructions, and I'll attach them
for you.

Comment 25

15 years ago
Ignore my comment #22, lack of sleep and WinXP restore points made me think I
was using gcc 3.3 when I really wasn't. That's what happens when you apply a MS
patch that breaks your internet access.

Anyway, I'm now really on gcc3.3, and still having all sorts of bother with
new.h, or specifically <new.h> in line 36 of
source/string/public/nsBufferHandleUtils.h, which actually uses NEW_H which is
defined in source/configure around line 5140 and 7154. I think I have a fix, but
the INCLUDE path is starting to look rather horrible...

Comment 26

15 years ago
Haveing sorted out the new/new.h issue, I'm now hitting a new problem which must
be gcc 3.3 related otherwise the tinderboxes would be red.

My mozilla/config/nsBuildID.h is fine until it gets to :
"define GRE_BUILD_ID "1.5a_0000000000
this should be
#define GRE_BUILD_ID "1.5a_0000000000"
but I can't figure out where this problem is coming from. The file is auto
generated if it doesn't already exist, but the code I've found seems fine. Any
help would be appreciated.

Comment 27

15 years ago
OK, I've now sorted out the GRE_BUILD_ID thing. Apparantly, the latest version
of cygwin Perl has a minor change which requires me to set an Environment
variable PERLIO=crlf, if not set it defaults to stdio, which isn't good (as I
found out).

Now, of course, I've hit another snag.
Info: resolving __ZTV9nsHashKey by linking to __imp___ZTV9nsHashKey (auto-import
)
nsSpecialSystemDirectory.o(.text$_ZN9nsHashKeyC2Ev+0xb): In function `ZNK20Syste
mDirectoriesKey5CloneEv':
c:/mozilla/source/mozilla/xpcom/obsolete/nsSpecialSystemDirectory.cpp: variable
'vtable for nsHashKey' can't be auto-imported. Please read the documentation for
 ld's --enable-auto-import for details.
Creating library file: libxpcom_compat.dll.a
make[3]: *** [xpcom_compat.dll] Error 1


And no, I haven't read the suggested documentation, mainly because I thought
"auto-import" was already set and happening.

Comment 28

15 years ago
I've given up on gcc3.3 from comment #19.

However, I noticed that gcc 3.2.3 is now available from MinGW. Using that
version, I see similar results as with gcc 2.95. Works fine when I set
enable_optimize, and crashes when I don't.

Comment 29

15 years ago
this crash seems to occur with -O2 , and can only be prevent by using -O
(default for --enable-optimize).  i will try to do some diffs for the assembly
and do so more testing to see which specific optimization option(s) seem to
cause the problem to reoccur at -O2.  I have a stacktrace of the crash if anyone
thinks it would be helpful.

Comment 30

15 years ago
Will, if your stacktrace is different from that posted at
http://bugzilla.mozilla.org/show_bug.cgi?id=134113#c287 then please add it to
this bug. In fact, since I had trouble re-finding that trace, maybe you should
add it anyway.

Comment 31

15 years ago
this is really weird.  i can build successfully at -O2 when building without
--disable-debug , but not with it.  my .mozconfig is the same in all other
respects and i am doing totally clean builds.

Comment 32

15 years ago
Created attachment 130763 [details] [diff] [review]
force -O

Comment 33

15 years ago
Whoops.  I meant to add that this is the workaround that I've been using on my
builds for a couple of months now.  I normally build with --enable-optimize=-O2
without a problem when using the patch.

Comment 34

15 years ago
*** Bug 218900 has been marked as a duplicate of this bug. ***

Comment 35

15 years ago
My build seems to have started working, possibly due to applying that patch.
Created attachment 131408 [details] [diff] [review]
patch

This solves the problem for real for me.  I got the idea from
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11893 , which is what I think we're
hitting here.
Except that patch doesn't build on MSVC.  Working on a fix.
Created attachment 131417 [details] [diff] [review]
patch which works on msvc and mingw gcc

This changes NS_STDCALL_FUNPROTO to take more arguments so that it can be
implemented with typeof() on gcc, and using the more verbose prototype
elsewhere.  I'd really prefer to not have to use typeof()... while it's more
compact, it requires having access to a method declared on the class which is
of the type you're declaring a pointer to.  I can't think of any situations in
our code where you wouldn't have access to such a function, but I think it
makes the declaration look less generic.
Attachment #131408 - Attachment is obsolete: true
Attachment #131417 - Flags: superreview?(dbaron)
Attachment #131417 - Flags: review?(cls)
Attachment #131417 - Flags: superreview?(dbaron) → superreview+

Updated

15 years ago
Attachment #131417 - Flags: review?(cls) → review+
checked in.
Status: NEW → RESOLVED
Last Resolved: 15 years ago
Resolution: --- → FIXED

Comment 40

15 years ago
My mingw builds and runs fine, also "fixes" bug 217124.
Status: RESOLVED → VERIFIED

Updated

14 years ago
Blocks: 251535
You need to log in before you can comment on or make changes to this bug.