Closed Bug 203137 Opened 22 years ago Closed 21 years ago

RDF problems with RDFContentSinkImpl::OpenObject in some MingW builds

Categories

(Core Graveyard :: RDF, defect)

x86
Windows XP
defect
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: jonwil, Assigned: dmosedale)

References

Details

(Whiteboard: workaround: always use --enable-optimize and never use -march=pentium3)

Attachments

(6 files, 1 obsolete file)

in some cases, building with MingW causes a crash related to RDFContentSinkImpl::OpenObject. If you back out the patch for bug 134113 which is labeled "RDF Changes V1.2" and apply the patch labeled "RDF changes", it works.
mingw port issue... taking
Assignee: rjc → dmose
Just had a quick look at the two versions of the RDF patches, and I see why the first patch would fix this problem. However, I suspect it will leave other issues unfixed. I've been trying to sort out exactly why it crashed with a SIGFAULT type error which according to Microsoft is due to a variable usage error (whatever that means). However, if you look at http://bugzilla.mozilla.org/show_bug.cgi?id=134113#c295 you will see roughly where I got to (and this also explains the reason why the earlier RDF patch works. Since posting that comment, I stepped thru the code via gdb, and after a lot of patience (I was using stepi in case I missed anything), I found that Mozilla crashed when returning back to rv = (gRDFContainerUtils->*(info->mMakeFn))(mDataSource, aContainer, nsnull); after having travelled to various pieces of code. Don't know why it crashed at that point, and I was hoping that cls' checkin for debug builds would help, but my machine was shipped before I could try it. I don't have a build machine at the moment, it's on a ship heading to New Zealand from Boston, USA, so I can't do much work on this at the moment, unless I manage to get hold of a better machine than what I'm using now. I'm adding cls to the cc list due to his vast experience in WIN32 and gcc, however that will not be an invitation for him to more than a lurker.
Blocks: mingw
Attached patch patchSplinter Review
This patch changes from the newer code back to the older code.
Comment on attachment 122414 [details] [diff] [review] patch Do we want to: 1.check this in and leave it at that? 2.check this in now and figure out a fix for the new code later? or 3.figure out a fix for the new code now?
Attachment #122414 - Flags: review?(rjc)
there is no reason to check in some patch for a crasher until we have a stacktrace for that crasher. IMHO.
If the patch fixes the problem, and passes all the required Mozilla tests/smoketests, then it should be checked in. After all, as far as I can tell, the majority of patches go in without a supporting stacktrace. Having said that, having a stacktrace helps verify the accuracy of the bug fix. I'd love to provide some trace info, but my "real" computer is on a container ship somewhere, on its way to New Zealand from the USA.
OK, this seems to be a GCC optimizer bug of some sort. If I build with '-O -march=pentium3' I see this bug. If I only build with '-O', I don't.
Comment on attachment 122414 [details] [diff] [review] patch Removing review request. We should get GCC fixed instead.
Attachment #122414 - Flags: review?(rjc)
I've attached the diffs between the assembly code generated for RDFContentSinkImpl::InitContainer without -march=pentium3 (which works), and with -march=pentium3 (which hangs).
I seem to recall that "march" is automagically set somewhere (rules.mk?). When you use just "-O", what is arch set to?
Further to this problem, I get the crash error using: current trunk seamonkey debug build (with no optomization settings set by me at all) gcc 3.2.2 (mingw special 20030208-1) this is the command like that gets passed to GCC: g++ -mno-cygwin -o nsRDFContentSink.o -c -DOSTYPE=\"WINNT5.1\" -DOSARCH=\"WINNT\" -I../../../dist/include/xpcom -I../../../dist/include/string -I../../../dist/include/rdfutil -I../../../dist/include/necko -I../../../dist/include/layout -I../../../dist/include/content -I../../../dist/include/htmlparser -I../../../dist/include/js -I../../../dist/include/unicharutil -I../../../dist/include/rdf -I../../../dist/include -I../../../dist/include/nspr -fno-rtti -fno-exceptions -Wall -Wconversion -Wpointer-arith -Wcast-align -Woverloaded-virtual -Wsynth -Wno-ctor-dtor-privacy -pedantic -Wno-long-long -mms-bitfields -pipe -DDEBUG -D_DEBUG -DDEBUG_Jonathan_Wilson -DTRACING -g -DX_DISPLAY_MISSING=1 -DHAVE_SNPRINTF=1 -D_WINDOWS=1 -D_WIN32=1 -DWIN32=1 -DXP_WIN=1 -DXP_WIN32=1 -DHW_THREADS=1 -DWINVER=0x400 -DSTDC_HEADERS=1 -DNEW_H=\<new\> -DWIN32_LEAN_AND_MEAN=1 -DNO_X11=1 -D_X86_=1 -DD_INO=d_ino -DSTDC_HEADERS=1 -Duid_t=int -Dgid_t=int -DHAVE_CPP_2BYTE_WCHAR_T=1 -DHAVE_DIRENT_H=1 -DHAVE_MEMORY_H=1 -DHAVE_UNISTD_H=1 -DHAVE_MALLOC_H=1 -DHAVE_MMINTRIN_H=1 -DNEW_H=\<new\> -DHAVE_LIBM=1 -DNO_X11=1 -DMMAP_MISSES_WRITES=1 -DHAVE_STRERROR=1 -DHAVE_SNPRINTF=1 -DHAVE_MEMMOVE=1 -DHAVE_RINT=1 -DVA_COPY=va_copy -DHAVE_VA_COPY=1 -DHAVE_IOS_BINARY=1 -DHAVE_CPP_EXPLICIT=1 -DHAVE_CPP_TYPENAME=1 -DHAVE_CPP_MODERN_SPECIALIZE_TEMPLATE_SYNTAX=1 -DHAVE_CPP_PARTIAL_SPECIALIZATION=1 -DHAVE_CPP_EXTERN_INSTANTIATION=1 -DHAVE_CPP_AMBIGUITY_RESOLVING_USING=1 -DHAVE_CPP_NAMESPACE_STD=1 -DHAVE_CPP_UNAMBIGUOUS_STD_NOTEQUAL=1 -DHAVE_CPP_NEW_CASTS=1 -DHAVE_CPP_DYNAMIC_CAST_TO_VOID_PTR=1 -DNEED_CPP_UNUSED_IMPLEMENTATIONS=1 -DMOZ_DEFAULT_TOOLKIT=\"windows\" -DMOZ_ENABLE_COREXFONTS=1 -DOJI=1 -DIBMBIDI=1 -DMOZ_XPINSTALL=1 -DMOZ_JSLOADER=1 -DMOZ_MATHML=1 -DMOZ_SVG=1 -DMOZ_LOGGING=1 -DDETECT_WEBSHELL_LEAKS=1 -DMOZ_USER_DIR=\".mozilla\" -DCPP_THROW_NEW=throw\(\) -DMOZ_XUL=1 -DMOZ_PROFILESHARING=1 -DMOZ_DLL_SUFFIX=\".dll\" -DJS_THREADSAFE=1 -DNS_PRINT_PREVIEW=1 -DNS_PRINTING=1 -DMOZ_REFLOW_PERF=1 -DMOZ_REFLOW_PERF_DSP=1 -DMOZILLA_VERSION=\"1.4b\" -D_MOZILLA_CONFIG_H_ -DMOZILLA_CLIENT ./nsRDFContentSink.cpp my .mozconfig is set to: ac_add_options --enable-calendar ac_add_options --enable-crypto ac_add_options --disable-accessibility ac_add_options --enable-extensions=all ac_add_options --enable-svg ac_add_options --disable-activex and I have my environment variable set to: set CC=gcc set CXX=g++ set AS=as set LD=ld set CPP=cpp set PATH=c:\moztools\bin;c:\mingw\bin;c:\cygwin\bin;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\COMMAND;C:\WINDOWS\system32\WBEM;c:\WINDOWS;C:\WINDOWS\COMMAND;c:\ssh set moz_tools=c:\moztools set HOME=c:\home set MOZ_INTERNAL_LIBART_LGPL=1 I am using the standard specs file from the gcc build in question. I am building on a Intel(R) Pentium(R) 4 CPU 2.40GHz (if that makes any difference)
OK, so this problem happens with a Pentium 4 and an AMD Duron CPU. I get the crash with an AMD. So, it's not a Intel vs AMD issue, although architectures within these brands may be a different story.
-mpentium{3,4} are rather fragile, since they were new in 3.2. Does gcc3.3 work? What about 3.2.3? I doubt that you'll see much difference over an i686 build. I'd be interested to know if we do on unix, at least.
jonwil: can you upload the assembly generated for that function? just do "make RDFContentSinkImpl.s" in the appropriate directory, and then upload the relevant portion of that file.
jonwil: actually, I need the assembly for InitContainer, not OpenObject. Sorry for not being more specific.
Attached file correct patch
correct patch attatched
OK, interesting: turning off -O makes it fail for me too. So the only case were gcc produces the correct code is with -O and without -march=pentium3.
Whiteboard: workaround: always use --enable-optimize and never use -march=pentium3
Best guess on this problem is that its something with GCC. There is an unofficial build of GCC 3.3 here: http://www.thisiscool.com/gcc33_mingw.htm I would try it but I dont have a tree ATM, so we need someone with a MingW build to: 1.see if using this GCC 3.3 build solves the RDF bug and 2.post the ASM for the problematic function from various builds
FYI, I've pulled a fresh tree (Trunk not 1.4final) and I've downloaded gcc3.3 as per comment #13. Building now (although I've already had to fix a problem with the compiler not finding new.h), so be patient.
Sorry for spam, my earlier message should have said comment #19 (and not comment #13).
I finally got Mozilla built with the gcc version mentioned in comment #19 (i.e. v3.3). Same problem happens in RDFContentSinkImpl. I didn't have -O set, so will try with it set.
Well, with gcc3.3 and with -O (--enable-optimize) my MingW build worked fine. Without -O it doesn't. So, do we still have a gcc problem, or a code problem that the optimizer magically fixes.
I'm going to do some more testing with gcc3.3 and the earlier "approved release" playing with my .mozconfig based on what I found in about:buildconfig. If you want any ASM listings, send me some instructions, and I'll attach them for you.
Ignore my comment #22, lack of sleep and WinXP restore points made me think I was using gcc 3.3 when I really wasn't. That's what happens when you apply a MS patch that breaks your internet access. Anyway, I'm now really on gcc3.3, and still having all sorts of bother with new.h, or specifically <new.h> in line 36 of source/string/public/nsBufferHandleUtils.h, which actually uses NEW_H which is defined in source/configure around line 5140 and 7154. I think I have a fix, but the INCLUDE path is starting to look rather horrible...
Haveing sorted out the new/new.h issue, I'm now hitting a new problem which must be gcc 3.3 related otherwise the tinderboxes would be red. My mozilla/config/nsBuildID.h is fine until it gets to : "define GRE_BUILD_ID "1.5a_0000000000 this should be #define GRE_BUILD_ID "1.5a_0000000000" but I can't figure out where this problem is coming from. The file is auto generated if it doesn't already exist, but the code I've found seems fine. Any help would be appreciated.
OK, I've now sorted out the GRE_BUILD_ID thing. Apparantly, the latest version of cygwin Perl has a minor change which requires me to set an Environment variable PERLIO=crlf, if not set it defaults to stdio, which isn't good (as I found out). Now, of course, I've hit another snag. Info: resolving __ZTV9nsHashKey by linking to __imp___ZTV9nsHashKey (auto-import ) nsSpecialSystemDirectory.o(.text$_ZN9nsHashKeyC2Ev+0xb): In function `ZNK20Syste mDirectoriesKey5CloneEv': c:/mozilla/source/mozilla/xpcom/obsolete/nsSpecialSystemDirectory.cpp: variable 'vtable for nsHashKey' can't be auto-imported. Please read the documentation for ld's --enable-auto-import for details. Creating library file: libxpcom_compat.dll.a make[3]: *** [xpcom_compat.dll] Error 1 And no, I haven't read the suggested documentation, mainly because I thought "auto-import" was already set and happening.
I've given up on gcc3.3 from comment #19. However, I noticed that gcc 3.2.3 is now available from MinGW. Using that version, I see similar results as with gcc 2.95. Works fine when I set enable_optimize, and crashes when I don't.
this crash seems to occur with -O2 , and can only be prevent by using -O (default for --enable-optimize). i will try to do some diffs for the assembly and do so more testing to see which specific optimization option(s) seem to cause the problem to reoccur at -O2. I have a stacktrace of the crash if anyone thinks it would be helpful.
Will, if your stacktrace is different from that posted at http://bugzilla.mozilla.org/show_bug.cgi?id=134113#c287 then please add it to this bug. In fact, since I had trouble re-finding that trace, maybe you should add it anyway.
this is really weird. i can build successfully at -O2 when building without --disable-debug , but not with it. my .mozconfig is the same in all other respects and i am doing totally clean builds.
Attached patch force -O Splinter Review
Whoops. I meant to add that this is the workaround that I've been using on my builds for a couple of months now. I normally build with --enable-optimize=-O2 without a problem when using the patch.
*** Bug 218900 has been marked as a duplicate of this bug. ***
My build seems to have started working, possibly due to applying that patch.
Attached patch patch (obsolete) — Splinter Review
This solves the problem for real for me. I got the idea from http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11893 , which is what I think we're hitting here.
Except that patch doesn't build on MSVC. Working on a fix.
This changes NS_STDCALL_FUNPROTO to take more arguments so that it can be implemented with typeof() on gcc, and using the more verbose prototype elsewhere. I'd really prefer to not have to use typeof()... while it's more compact, it requires having access to a method declared on the class which is of the type you're declaring a pointer to. I can't think of any situations in our code where you wouldn't have access to such a function, but I think it makes the declaration look less generic.
Attachment #131408 - Attachment is obsolete: true
Attachment #131417 - Flags: superreview?(dbaron)
Attachment #131417 - Flags: review?(cls)
Attachment #131417 - Flags: superreview?(dbaron) → superreview+
Attachment #131417 - Flags: review?(cls) → review+
checked in.
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
My mingw builds and runs fine, also "fixes" bug 217124.
Status: RESOLVED → VERIFIED
Blocks: 251535
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: