connect to sync crashes Firefox in [@ pk11_RawPBEKeyGenWithKeyType] with gcc 4.5 because gcc 4.5 is broken

RESOLVED FIXED

Status

()

Core
js-ctypes
--
critical
RESOLVED FIXED
7 years ago
7 years ago

People

(Reporter: wolfiR, Assigned: glandium)

Tracking

({crash})

Trunk
x86
Linux
crash
Points:
---

Firefox Tracking Flags

(blocking2.0 betaN+)

Details

(crash signature)

Attachments

(2 attachments, 1 obsolete attachment)

(Reporter)

Description

7 years ago
If I try to connect Sync with Firefox 4.0b4 it crashes in libnss3:

WeaveCrypto: _deriveKeyFromPassphrase() called.

Program received signal SIGSEGV, Segmentation fault.
pk11_RawPBEKeyGenWithKeyType (slot=0xa6a0a400, type=944, params=0x9e4db4d0, keyType=31, keyLen=32, pwitem=0x0, wincx=0x0) at pk11pbe.c:1305
1305            pbev2_params->pPassword = pwitem->data;
(gdb) bt
#0  pk11_RawPBEKeyGenWithKeyType (slot=0xa6a0a400, type=944, params=0x9e4db4d0, keyType=31, keyLen=32, pwitem=0x0, wincx=0x0) at pk11pbe.c:130
#1  0xb5e93922 in PK11_PBEKeyGen (slot=0xa6a0a400, algid=0x9d7a7f20, pwitem=0x0, faulty3DES=0, wincx=0x0) at pk11pbe.c:1392
#2  0xb7a7b95e in ffi_call_SYSV () from /usr/lib/xulrunner-2.0b4/libmozjs.so

I also get the following error in the error console which might be related.
(Reporter)

Comment 1

7 years ago
Error: cannot read contents of null pointer
Source File: file:///usr/lib/firefox/components/WeaveCrypto.js
Line: 1004
(Reporter)

Comment 2

7 years ago
Line 1004
   let intData = ctypes.cast(data, ctypes.uint8_t.array(len).ptr).contents;
(Reporter)

Updated

7 years ago
Keywords: crash
Summary: connect to sync crashes Firefox in libnss3 → connect to sync crashes Firefox in libnss3 [@ pk11_RawPBEKeyGenWithKeyType]
(Reporter)

Comment 3

7 years ago
Mike, have you seen something like this by any chance? Installed system NSS is 3.12.8pre
(Assignee)

Comment 4

7 years ago
Sync works for me on 4.0b4 with libnss 3.12.8beta2. But I only tested on x86-64, not x86.
(Reporter)

Comment 5

7 years ago
hmm, interesting. I've tried it on a x86-64 installation now and it doesn't crash.
I need to test a bit more then. My "master" Firefox is a 64bit one.
I'll need to test another 32bit installation to see if it's the architecture or something else.
(Reporter)

Comment 6

7 years ago
Tried it with a new VM 32-bit. And Firefox crashes when sync data is to be imported.
It really seems to me that something is not arch agnostic here.
(Assignee)

Comment 7

7 years ago
I just tried in a 32-bits chroot, and it worked properly for me.
(Reporter)

Comment 8

7 years ago
Hmm,

http://mxr.mozilla.org/mozilla-central/source/services/crypto/WeaveCrypto.js#1043
let passItem = this.makeSECItem(passphrase, false);

returns a null pointer here.
passphrase is set correctly.

http://mxr.mozilla.org/mozilla-central/source/services/crypto/WeaveCrypto.js#1038
return new this.nss_t.SECItem(this.nss.SIBUFFER, outputData, outputData.length);

This returns NULL while the input (actually outputData) looks sane.
(Reporter)

Comment 9

7 years ago
I'm failing to further debug this. Not sure where to set a breakpoint.
Did you confirm if a Mozilla-produced nightly build crashes? [From IRC you said this was with your own build.]
(Reporter)

Comment 11

7 years ago
Firefox 4.0b4 as shipped from Mozilla does not crash.
(Good for you but doesn't help me much)

Comment 12

7 years ago
What's different about your build env? CFLAGS etc?
(Reporter)

Comment 13

7 years ago
Compiler  	Version  	Compiler flags
gcc 	gcc version 4.5.0 20100604 [gcc-4_5-branch revision 160292] (SUSE Linux) 	-Wall -W -Wno-unused -Wpointer-arith -Wcast-align -W -pedantic -Wno-long-long -fomit-frame-pointer -fmessage-length=0 -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -g -Os -fno-strict-aliasing-pthread -pipe -DNDEBUG -DTRIMMED -Os -freorder-blocks -fomit-frame-pointer -finline-limit=50
c++ 	gcc version 4.5.0 20100604 [gcc-4_5-branch revision 160292] (SUSE Linux) 	-fno-rtti -fno-exceptions -Wall -Wpointer-arith -Woverloaded-virtual -Wsynth -Wno-ctor-dtor-privacy -Wno-non-virtual-dtor -Wcast-align -Wno-invalid-offsetof -Wno-variadic-macros -Werror=return-type -pedantic -Wno-long-long -fomit-frame-pointer -fmessage-length=0 -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -g -Os -fno-strict-aliasing -fno-strict-aliasing -fshort-wchar -pthread -pipe -DNDEBUG -DTRIMMED -Os -freorder-blocks -fomit-frame-pointer -finline-limit=50


Configure arguments
--enable-application=xulrunner --prefix=/usr --libdir=/usr/lib --sysconfdir=/etc --mandir=/usr/share/man --includedir=/usr/include --enable-optimize --enable-extensions=default --with-system-jpeg --with-system-zlib --with-l10n-base=../l10n --disable-tests --disable-mochitest --disable-installer --disable-updater --disable-javaxpcom --enable-startup-notification --enable-url-classifier --enable-system-hunspell --enable-shared-js --with-system-nspr --with-system-nss --enable-libproxy
(Assignee)

Comment 14

7 years ago
Could that be related to -D_FORTIFY_SOURCE=2 -fstack-protector ?
(Assignee)

Comment 15

7 years ago
... or gcc 4.5. These are the 2 main differences I see with both Debian builds and Mozilla builds.
(Reporter)

Comment 16

7 years ago
(In reply to comment #14)
> Could that be related to -D_FORTIFY_SOURCE=2 -fstack-protector ?

We have that since ages (which doesn't rule it out though).

(In reply to comment #15)
> ... or gcc 4.5. These are the 2 main differences I see with both Debian builds
> and Mozilla builds.

I can actually try a build on openSUSE 11.2 with an older gcc to see if it works there.
(Reporter)

Comment 17

7 years ago
Running a build from openSUSE 11.2 (pre gcc4.5) on openSUSE 11.3 (gcc4.5) does not crash. So it's most likely compiler toolchain related.
(Assignee)

Comment 18

7 years ago
(In reply to comment #17)
> Running a build from openSUSE 11.2 (pre gcc4.5) on openSUSE 11.3 (gcc4.5) does
> not crash. So it's most likely compiler toolchain related.

That could be the compiler breaking libffi (not unheard of). What you can try to do is build against system libffi (bug 551138) with gcc 4.5, and try with libffi itself built with gcc < 4.5 and gcc 4.5.
(Reporter)

Updated

7 years ago
Summary: connect to sync crashes Firefox in libnss3 [@ pk11_RawPBEKeyGenWithKeyType] → connect to sync crashes Firefox in [@ pk11_RawPBEKeyGenWithKeyType] with gcc 4.5

Comment 19

7 years ago
Or hack the configure arguments to libffi (in js/src/configure.in) to give it CC=gcc4.3.

It'd be somewhat hilarious if libffi was broken with 4.5, given that it lives in the gcc tree. ;)

I'm thinking it's something else, but you should definitely rule this out first.

Comment 20

7 years ago
Taras says other people have seen this... libffi is broken with 4.5. (Good job guys!)

Updated

7 years ago
Component: Crypto → js-ctypes
Product: Weave → Core
QA Contact: crypto → js-ctypes
Summary: connect to sync crashes Firefox in [@ pk11_RawPBEKeyGenWithKeyType] with gcc 4.5 → connect to sync crashes Firefox in [@ pk11_RawPBEKeyGenWithKeyType] with gcc 4.5 because libffi is broken
Version: unspecified → Trunk
(Assignee)

Updated

7 years ago
Blocks: 559964

Comment 21

7 years ago
Is this fixed with gcc trunk? If so let's dupe bug 590683 to this.

Comment 22

7 years ago
Also, we should check if 4.1 branch tip works. (Small chance, but worth a shot!) If so, we can just wait for the next point release, or pull branch tip directly.

Comment 23

7 years ago
(In reply to comment #22)
> Also, we should check if 4.1 branch tip works.

You mean 4.5 branch tip, right? ;-)

Comment 24

7 years ago
Um. Yes. :(

Comment 25

7 years ago
I can spend cycles on this if it blocks, say, betaN. (I think it should, since we can't really deploy 4.5 with this!)

I don't think it'll be that hard to track down, but getting the gcc fix (unless we can work around it in libffi) might take some doing. That alone might make this a non-starter for 4. :(
blocking2.0: --- → ?

Comment 26

7 years ago
(In reply to comment #25)
> I can spend cycles on this if it blocks, say, betaN. (I think it should, since
> we can't really deploy 4.5 with this!)
> 
> I don't think it'll be that hard to track down, but getting the gcc fix (unless
> we can work around it in libffi) might take some doing. That alone might make
> this a non-starter for 4. :(

Problem is that I expect linux distributions will be shipping ff with 4.5(some already are), so whether we switch to 4.5 or not, users will suffer at distribution's mercy.
blocking2.0: ? → betaN+

Updated

7 years ago
Assignee: nobody → dwitte
Status: NEW → ASSIGNED
blocking2.0: betaN+ → ?
blocking2.0: ? → betaN+

Comment 27

7 years ago
I can reproduce this on 4.5.1 opt linux x86_64. Not debug.

We can't get around this by simply building libffi without -O2; I still see the problem. So this is related to the code generation of the function being called by the test (sum_many_bool_cdecl).

I have assembly dumps from 4.4 opt and 4.5.1 opt, but they're rather different, so it might take a bit to grok them.

Comment 28

7 years ago
This goes away at -O0, -O1, and -O2; it only occurs for -Os. Manually splitting out -Os into all its constituent optimization flags passes, so the gcc docs are out of date or -Os is doing other magic that you can't get otherwise.

I can't reproduce using a .c file that contains the relevant function and calls it, so the bug here is in the interaction between libffi and gcc.
(Assignee)

Comment 29

7 years ago
Created attachment 475076 [details] [diff] [review]
Workaround

Building CTypes.o with -O2 -finline-functions is enough to break jsctypes. -O3 enables it. Note that depending on how you were setting your build flags, you may not have been overriding the flags set in js/src/Makefile.in, which uses -O3 and some other flags.

Forcing to build CTypes.o (only) with -O2 -fstrict-aliasing -fomit-frame-pointer makes everything work for me (xpcshell-tests and Firefox Sync).
(Assignee)

Comment 30

7 years ago
Created attachment 475079 [details]
Reduced testcase

FWIW, here is a reduced testcase (reduced from test_jsctypes.js) that exhibits the problem.
(Assignee)

Comment 31

7 years ago
Note that I *can't* reproduce on x86-64 with gcc 4.5.1. Only x86.
(Assignee)

Comment 32

7 years ago
I identified where in the CTypes.cpp code the problem is occuring:
in CData::Address:
  // Manually set the pointer inside the object, so we skip the conversion step.
  void** data = static_cast<void**>(GetData(cx, result));
  *data = GetData(cx, obj);

The wierdest thing is that when adding a printf after that for the value of *data, it prints the expected value. But when the value is read in PointerType::ContentsGetter, it's NULL. It turns out at the assembly level (the one gcc 4.5 generates), it actually never assigns *data! It also appears to be a gcc bug, not a C++ backend bug.
I wrote a simple (and scary) testcase to send to the upstream gcc bug.
Summary: connect to sync crashes Firefox in [@ pk11_RawPBEKeyGenWithKeyType] with gcc 4.5 because libffi is broken → connect to sync crashes Firefox in [@ pk11_RawPBEKeyGenWithKeyType] with gcc 4.5 because gcc 4.5 is broken
(Assignee)

Comment 33

7 years ago
(In reply to comment #32)
> I identified where in the CTypes.cpp code the problem is occuring:
> in CData::Address:
>   // Manually set the pointer inside the object, so we skip the conversion
> step.
>   void** data = static_cast<void**>(GetData(cx, result));
>   *data = GetData(cx, obj);

The bug only triggers when GetData is inlined. So if you add the inline keyword to the GetData function definition, it starts breaking at -O1.
(Assignee)

Updated

7 years ago

Comment 34

7 years ago
We could take a workaround in our tree to disable the relevant optimization, but if we do, it should be treewide -- since we have no idea what other code gcc is generating incorrectly (and the use of jsval is rather prevalent!). Per Taras, since we're deploying a custom gcc 4.5 anyway, it wouldn't be hard to add a gcc patch locally so I think that's preferable. If we don't get a fix soon we can reconsider the workaround.

Comment 35

7 years ago
Created attachment 476399 [details] [diff] [review]
testcase

Updated

7 years ago
Attachment #476399 - Attachment is obsolete: true

Comment 36

7 years ago
Marking fixed since the relevant patch is in trunk and 4.5 branch gcc, and we've pulled the branch patch into our 4.5 for when we deploy.
Assignee: dwitte → mh+mozilla
Status: ASSIGNED → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED

Updated

7 years ago
Severity: normal → critical

Updated

7 years ago
Duplicate of this bug: 625774
Crash Signature: [@ pk11_RawPBEKeyGenWithKeyType]
You need to log in before you can comment on or make changes to this bug.