Closed Bug 590683 Opened 14 years ago Closed 14 years ago

connect to sync crashes Firefox in [@ pk11_RawPBEKeyGenWithKeyType] with gcc 4.5 because gcc 4.5 is broken

Categories

(Core :: js-ctypes, defect)

x86
Linux
defect
Not set
critical

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- betaN+

People

(Reporter: wolfiR, Assigned: glandium)

References

Details

(Keywords: crash)

Crash Data

Attachments

(2 files, 1 obsolete file)

If I try to connect Sync with Firefox 4.0b4 it crashes in libnss3:

WeaveCrypto: _deriveKeyFromPassphrase() called.

Program received signal SIGSEGV, Segmentation fault.
pk11_RawPBEKeyGenWithKeyType (slot=0xa6a0a400, type=944, params=0x9e4db4d0, keyType=31, keyLen=32, pwitem=0x0, wincx=0x0) at pk11pbe.c:1305
1305            pbev2_params->pPassword = pwitem->data;
(gdb) bt
#0  pk11_RawPBEKeyGenWithKeyType (slot=0xa6a0a400, type=944, params=0x9e4db4d0, keyType=31, keyLen=32, pwitem=0x0, wincx=0x0) at pk11pbe.c:130
#1  0xb5e93922 in PK11_PBEKeyGen (slot=0xa6a0a400, algid=0x9d7a7f20, pwitem=0x0, faulty3DES=0, wincx=0x0) at pk11pbe.c:1392
#2  0xb7a7b95e in ffi_call_SYSV () from /usr/lib/xulrunner-2.0b4/libmozjs.so

I also get the following error in the error console which might be related.
Error: cannot read contents of null pointer
Source File: file:///usr/lib/firefox/components/WeaveCrypto.js
Line: 1004
Line 1004
   let intData = ctypes.cast(data, ctypes.uint8_t.array(len).ptr).contents;
Keywords: crash
Summary: connect to sync crashes Firefox in libnss3 → connect to sync crashes Firefox in libnss3 [@ pk11_RawPBEKeyGenWithKeyType]
Mike, have you seen something like this by any chance? Installed system NSS is 3.12.8pre
Sync works for me on 4.0b4 with libnss 3.12.8beta2. But I only tested on x86-64, not x86.
hmm, interesting. I've tried it on a x86-64 installation now and it doesn't crash.
I need to test a bit more then. My "master" Firefox is a 64bit one.
I'll need to test another 32bit installation to see if it's the architecture or something else.
Tried it with a new VM 32-bit. And Firefox crashes when sync data is to be imported.
It really seems to me that something is not arch agnostic here.
I just tried in a 32-bits chroot, and it worked properly for me.
Hmm,

http://mxr.mozilla.org/mozilla-central/source/services/crypto/WeaveCrypto.js#1043
let passItem = this.makeSECItem(passphrase, false);

returns a null pointer here.
passphrase is set correctly.

http://mxr.mozilla.org/mozilla-central/source/services/crypto/WeaveCrypto.js#1038
return new this.nss_t.SECItem(this.nss.SIBUFFER, outputData, outputData.length);

This returns NULL while the input (actually outputData) looks sane.
I'm failing to further debug this. Not sure where to set a breakpoint.
Did you confirm if a Mozilla-produced nightly build crashes? [From IRC you said this was with your own build.]
Firefox 4.0b4 as shipped from Mozilla does not crash.
(Good for you but doesn't help me much)
What's different about your build env? CFLAGS etc?
Compiler  	Version  	Compiler flags
gcc 	gcc version 4.5.0 20100604 [gcc-4_5-branch revision 160292] (SUSE Linux) 	-Wall -W -Wno-unused -Wpointer-arith -Wcast-align -W -pedantic -Wno-long-long -fomit-frame-pointer -fmessage-length=0 -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -g -Os -fno-strict-aliasing-pthread -pipe -DNDEBUG -DTRIMMED -Os -freorder-blocks -fomit-frame-pointer -finline-limit=50
c++ 	gcc version 4.5.0 20100604 [gcc-4_5-branch revision 160292] (SUSE Linux) 	-fno-rtti -fno-exceptions -Wall -Wpointer-arith -Woverloaded-virtual -Wsynth -Wno-ctor-dtor-privacy -Wno-non-virtual-dtor -Wcast-align -Wno-invalid-offsetof -Wno-variadic-macros -Werror=return-type -pedantic -Wno-long-long -fomit-frame-pointer -fmessage-length=0 -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -g -Os -fno-strict-aliasing -fno-strict-aliasing -fshort-wchar -pthread -pipe -DNDEBUG -DTRIMMED -Os -freorder-blocks -fomit-frame-pointer -finline-limit=50


Configure arguments
--enable-application=xulrunner --prefix=/usr --libdir=/usr/lib --sysconfdir=/etc --mandir=/usr/share/man --includedir=/usr/include --enable-optimize --enable-extensions=default --with-system-jpeg --with-system-zlib --with-l10n-base=../l10n --disable-tests --disable-mochitest --disable-installer --disable-updater --disable-javaxpcom --enable-startup-notification --enable-url-classifier --enable-system-hunspell --enable-shared-js --with-system-nspr --with-system-nss --enable-libproxy
Could that be related to -D_FORTIFY_SOURCE=2 -fstack-protector ?
... or gcc 4.5. These are the 2 main differences I see with both Debian builds and Mozilla builds.
(In reply to comment #14)
> Could that be related to -D_FORTIFY_SOURCE=2 -fstack-protector ?

We have that since ages (which doesn't rule it out though).

(In reply to comment #15)
> ... or gcc 4.5. These are the 2 main differences I see with both Debian builds
> and Mozilla builds.

I can actually try a build on openSUSE 11.2 with an older gcc to see if it works there.
Running a build from openSUSE 11.2 (pre gcc4.5) on openSUSE 11.3 (gcc4.5) does not crash. So it's most likely compiler toolchain related.
(In reply to comment #17)
> Running a build from openSUSE 11.2 (pre gcc4.5) on openSUSE 11.3 (gcc4.5) does
> not crash. So it's most likely compiler toolchain related.

That could be the compiler breaking libffi (not unheard of). What you can try to do is build against system libffi (bug 551138) with gcc 4.5, and try with libffi itself built with gcc < 4.5 and gcc 4.5.
Summary: connect to sync crashes Firefox in libnss3 [@ pk11_RawPBEKeyGenWithKeyType] → connect to sync crashes Firefox in [@ pk11_RawPBEKeyGenWithKeyType] with gcc 4.5
Or hack the configure arguments to libffi (in js/src/configure.in) to give it CC=gcc4.3.

It'd be somewhat hilarious if libffi was broken with 4.5, given that it lives in the gcc tree. ;)

I'm thinking it's something else, but you should definitely rule this out first.
Taras says other people have seen this... libffi is broken with 4.5. (Good job guys!)
Component: Crypto → js-ctypes
Product: Weave → Core
QA Contact: crypto → js-ctypes
Summary: connect to sync crashes Firefox in [@ pk11_RawPBEKeyGenWithKeyType] with gcc 4.5 → connect to sync crashes Firefox in [@ pk11_RawPBEKeyGenWithKeyType] with gcc 4.5 because libffi is broken
Version: unspecified → Trunk
Blocks: gcc4.5
Is this fixed with gcc trunk? If so let's dupe bug 590683 to this.
Also, we should check if 4.1 branch tip works. (Small chance, but worth a shot!) If so, we can just wait for the next point release, or pull branch tip directly.
(In reply to comment #22)
> Also, we should check if 4.1 branch tip works.

You mean 4.5 branch tip, right? ;-)
Um. Yes. :(
I can spend cycles on this if it blocks, say, betaN. (I think it should, since we can't really deploy 4.5 with this!)

I don't think it'll be that hard to track down, but getting the gcc fix (unless we can work around it in libffi) might take some doing. That alone might make this a non-starter for 4. :(
blocking2.0: --- → ?
(In reply to comment #25)
> I can spend cycles on this if it blocks, say, betaN. (I think it should, since
> we can't really deploy 4.5 with this!)
> 
> I don't think it'll be that hard to track down, but getting the gcc fix (unless
> we can work around it in libffi) might take some doing. That alone might make
> this a non-starter for 4. :(

Problem is that I expect linux distributions will be shipping ff with 4.5(some already are), so whether we switch to 4.5 or not, users will suffer at distribution's mercy.
blocking2.0: ? → betaN+
Assignee: nobody → dwitte
Status: NEW → ASSIGNED
blocking2.0: betaN+ → ?
blocking2.0: ? → betaN+
I can reproduce this on 4.5.1 opt linux x86_64. Not debug.

We can't get around this by simply building libffi without -O2; I still see the problem. So this is related to the code generation of the function being called by the test (sum_many_bool_cdecl).

I have assembly dumps from 4.4 opt and 4.5.1 opt, but they're rather different, so it might take a bit to grok them.
This goes away at -O0, -O1, and -O2; it only occurs for -Os. Manually splitting out -Os into all its constituent optimization flags passes, so the gcc docs are out of date or -Os is doing other magic that you can't get otherwise.

I can't reproduce using a .c file that contains the relevant function and calls it, so the bug here is in the interaction between libffi and gcc.
Attached patch WorkaroundSplinter Review
Building CTypes.o with -O2 -finline-functions is enough to break jsctypes. -O3 enables it. Note that depending on how you were setting your build flags, you may not have been overriding the flags set in js/src/Makefile.in, which uses -O3 and some other flags.

Forcing to build CTypes.o (only) with -O2 -fstrict-aliasing -fomit-frame-pointer makes everything work for me (xpcshell-tests and Firefox Sync).
Attached file Reduced testcase
FWIW, here is a reduced testcase (reduced from test_jsctypes.js) that exhibits the problem.
Note that I *can't* reproduce on x86-64 with gcc 4.5.1. Only x86.
I identified where in the CTypes.cpp code the problem is occuring:
in CData::Address:
  // Manually set the pointer inside the object, so we skip the conversion step.
  void** data = static_cast<void**>(GetData(cx, result));
  *data = GetData(cx, obj);

The wierdest thing is that when adding a printf after that for the value of *data, it prints the expected value. But when the value is read in PointerType::ContentsGetter, it's NULL. It turns out at the assembly level (the one gcc 4.5 generates), it actually never assigns *data! It also appears to be a gcc bug, not a C++ backend bug.
I wrote a simple (and scary) testcase to send to the upstream gcc bug.
Summary: connect to sync crashes Firefox in [@ pk11_RawPBEKeyGenWithKeyType] with gcc 4.5 because libffi is broken → connect to sync crashes Firefox in [@ pk11_RawPBEKeyGenWithKeyType] with gcc 4.5 because gcc 4.5 is broken
(In reply to comment #32)
> I identified where in the CTypes.cpp code the problem is occuring:
> in CData::Address:
>   // Manually set the pointer inside the object, so we skip the conversion
> step.
>   void** data = static_cast<void**>(GetData(cx, result));
>   *data = GetData(cx, obj);

The bug only triggers when GetData is inlined. So if you add the inline keyword to the GetData function definition, it starts breaking at -O1.
We could take a workaround in our tree to disable the relevant optimization, but if we do, it should be treewide -- since we have no idea what other code gcc is generating incorrectly (and the use of jsval is rather prevalent!). Per Taras, since we're deploying a custom gcc 4.5 anyway, it wouldn't be hard to add a gcc patch locally so I think that's preferable. If we don't get a fix soon we can reconsider the workaround.
Attached patch testcase (obsolete) — Splinter Review
Attachment #476399 - Attachment is obsolete: true
Marking fixed since the relevant patch is in trunk and 4.5 branch gcc, and we've pulled the branch patch into our 4.5 for when we deploy.
Assignee: dwitte → mh+mozilla
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Severity: normal → critical
Crash Signature: [@ pk11_RawPBEKeyGenWithKeyType]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: