Closed Bug 604653 Opened 14 years ago Closed 14 years ago

64bit Mac build crashes with negotiate auth and Kerberos [@ Kerberos@0x6a163 ]

Categories

(Core :: Networking, defect)

x86_64
All
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla2.0b8
Tracking Status
blocking2.0 --- final+

People

(Reporter: kenz.gelsoft, Assigned: Bienvenu)

References

Details

(Keywords: crash, platform-parity, regression, Whiteboard: [tb33needed])

Crash Data

Attachments

(1 file)

User-Agent:       Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0b8pre) Gecko/20101011 Firefox/4.0b8pre
Build Identifier: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0b8pre) Gecko/20101011 Firefox/4.0b8pre

64bit Mac build crashes on the web pages using Kerberos by negotiate auth.

 * Crash reports: http://crash-stats.mozilla.com/report/list?range_value=2&range_unit=weeks&date=2010-10-14%2002%3A00%3A00&signature=Kerberos%400x6a163&version=Firefox%3A4.0b8pre

I have experienced this crashes on two intranet servers on my company. I don't think this is server specific problem since two servers vary in those configurations, one is Windows Server and the other is Apache with mod_auth_kerb on Linux server.

I have NO reproducible URL on the Internet site. (Usually, Negotiate auth is used on the intranet site...)

Reproducible: Always

Steps to Reproduce:
1. set up Firefox to use negotiate auth by setting network.negotiate-auth.trusted-uris.
2. open an web page which uses Kerberos w/ negotiate auth.
3.
Actual Results:  
Crashed right after I entered the web site url.

Expected Results:  
Don't crash, open the web page.

These days, 32bit and 64bit intel builds are unified into single 32/64bit universal binary, then 64bit capable Mac always run 64bit Firefox, so I can't work around this by using 32bit builds.

I confirmed this problem with the first 64bit mac nightly build (20100408, 657bebceeb18). So this feature have never been working from DAY 1.
We're just doing initial set-up on Thunderbird 64 bit mac and we're seeing this crash in our unit tests as well.

If I understand them correctly, the unit tests are testing initiating a GSSAPI connection and checking it fails. The crash seems to be in a similar place to the crash above.

Requesting blocking 2.0 as this is a regression in Mac 64 bit builds (versus the 32 bit builds running on 10.6 mac) that is well defined and reproducible.


Here's the Unit test info from Thunderbird's test:

http://tinderbox.mozilla.org/showlog.cgi?log=ThunderbirdTry/1287195061.1287195921.10037.gz#err5

TEST-UNEXPECTED-FAIL | /builds/slave/tryserver-macosx64-opt-unittest-xpcshell/build/xpcshell/tests/mailnews/local/test/unit/test_pop3GSSAPIFail.js | test failed (with xpcshell return code: 1), see following log:
  >>>>>>>
  TEST-INFO | (xpcshell/head.js) | test 1 pending
Directory request for: MailD that we (mailDirService.js) are not handling, leaving it to another handler.
Directory request for: MFCaF that we (mailDirService.js) are not handling, leaving it to another handler.
Directory request for: DefRt that we (mailDirService.js) are not handling, leaving it to another handler.
TEST-INFO | (xpcshell/head.js) | test 2 pending
NEXT test is: GSSAPI auth, server with GSSAPI only

  <<<<<<<
PROCESS-CRASH | /builds/slave/tryserver-macosx64-opt-unittest-xpcshell/build/xpcshell/tests/mailnews/local/test/unit/test_pop3GSSAPIFail.js | application crashed (minidump found)
Operating system: Mac OS X
                  10.6.4 10F616
CPU: amd64
     family 6 model 23 stepping 6
     4 CPUs

Crash reason:  EXC_BAD_ACCESS / 0x0000000d
Crash address: 0x0

Thread 0 (crashed)
 0  0x7fffffe007bf
    rbx = 0x5fbfccb8   r12 = 0x0638a1a0   r13 = 0x01addf90   r14 = 0x0638a178
    r15 = 0x5fbfcca8   rip = 0xffe007bf   rsp = 0x5fbfcb30   rbp = 0x5fbfcb30
    Found by: given as instruction pointer in context
 1  Kerberos + 0x6a163
    rip = 0x880df164   rsp = 0x5fbfcb40
    Found by: stack scanning
 2  Kerberos + 0x67653
    rip = 0x880dc654   rsp = 0x5fbfcb70
    Found by: stack scanning
 3  libSystem.B.dylib + 0x6b19
    rip = 0x87b12b1a   rsp = 0x5fbfcb80
    Found by: stack scanning

Here's the crash stack from comment 0:

0  	 	@0x7fffffe007bf  	
1 	Kerberos 	Kerberos@0x6a163 	
2 	Kerberos 	Kerberos@0x67653 	
3 	libSystem.B.dylib 	libSystem.B.dylib@0x4f09 	
4 	XUL 	nsAuthGSSAPI::GetNextToken 	extensions/auth/nsAuthGSSAPI.cpp:429
5 	XUL 	nsHttpNegotiateAuth::GenerateCredentials 	extensions/auth/nsHttpNegotiateAuth.cpp:284
6 	XUL 	nsHttpChannelAuthProvider::GenCredsAndSetEntry 	netwerk/protocol/http/nsHttpChannelAuthProvider.cpp:379
7 	XUL 	nsHttpChannelAuthProvider::GetCredentialsForChallenge 	netwerk/protocol/http/nsHttpChannelAuthProvider.cpp:766
8 	XUL 	nsHttpChannelAuthProvider::GetCredentials 	netwerk/protocol/http/nsHttpChannelAuthProvider.cpp:528
9 	XUL 	nsHttpChannelAuthProvider::ProcessAuthentication 	netwerk/protocol/http/nsHttpChannelAuthProvider.cpp:167
10 	XUL 	nsHttpChannel::ProcessResponse 	netwerk/protocol/http/nsHttpChannel.cpp:1090
11 	XUL 	nsHttpChannel::OnStartRequest 	netwerk/protocol/http/nsHttpChannel.cpp:3805
12 	XUL 	nsInputStreamPump::OnInputStreamReady 	netwerk/base/src/nsInputStreamPump.cpp:441
13 	XUL 	nsInputStreamReadyEvent::Run
Status: UNCONFIRMED → NEW
blocking2.0: --- → ?
Component: Networking: HTTP → Networking
Ever confirmed: true
Keywords: regression
QA Contact: networking.http → networking
Whiteboard: [tbtrunkneeds]
The crashing code being here:

http://hg.mozilla.org/mozilla-central/annotate/2593c8c8af8b/extensions/auth/nsAuthGSSAPI.cpp#l426

with the gss_import_name_ptr call (calls gss_import_name).
"regression" meaning that 1) it used to work on 64bit Mac builds and no longer does since a certain time/commit, or 2) it works on 32bit Mac and doesn't work on 64bit Mac? I don't think the latter is regression, but pp (platform parity).
(In reply to comment #3)
> "regression" meaning that 1) it used to work on 64bit Mac builds and no longer
> does since a certain time/commit, or 2) it works on 32bit Mac and doesn't work
> on 64bit Mac? I don't think the latter is regression, but pp (platform parity).

"regression" from the user perspective who don't care about if they are running 32 or 64 bit, but just want a running build.
Keywords: regression
Adding the stack so it gets picked up in crash-stats.
Summary: 64bit Mac build crashes with negotiate auth and Kerberos. → 64bit Mac build crashes with negotiate auth and Kerberos [@ Kerberos@0x6a163 ]
Blocks: 537496
If I had to guess, it would be that this definition isn't 64-bit safe:

static gss_OID_desc gss_c_nt_hostbased_service =
hg@1
86 { 10, (void *) "\x2a\x86\x48\x86\xf7\x12\x01\x02\x01\x04" };
Changing OS to All since Win64 build also crashes.

But STR is a bit different and I couldn't get the crash report. So it can be another problem.

STR on Win64 build:
 1. set up Firefox to use negotiate auth by setting
network.negotiate-auth.trusted-uris.

Immediately, it crashed.
OS: Mac OS X → All
I've got a patch pushed to our try server to see what effect replacing &gss_c_nt_hostbased_service with GSS_C_NT_HOSTBASED_SERVICE has.

My suspicion is that although Kerberos systems may not have always defined GSS_C_NT_HOSTBASED_SERVICE in the past, they may do now.
Great, Mark.  Marking blocking and assigning to you!  :)
Assignee: nobody → bugzilla
blocking2.0: ? → final+
Keywords: regression
The bad news is attempt 1 didn't work, so I've got to rethink it a bit.
Whiteboard: [tbtrunkneeds] → [tb33needs]
Some drive by comments, as I unfortunately don't have time to investigate this in detail.

I doubt that it's the definition of gss_c_nt_hostbased_service that's the problem, as that's exactly the way that is defined in the Kerberos source.

Has anyone checked that the PRFuncPtr type, and the related functions around that are 64bit safe. We're blowing up on the first call we try to make into the dynamically loaded GSSAPI library, so I'm inclined to wonder if there's a problem there.

Another option would be that the fact that we're not zero-ing minorStatus and the server - these are both output-only and so _shouldn't_ need to be zero'd, but it might be worth giving that a go.
I just wanted to add that this bug only showed up when Beta 7 came out.  I have had no problems with with FF on 64-bit Mac until now (Beta 7 works on our intranet in 64-bit Linux and in 32-bit Windows, though).  

This is a 4.0 showstopper, as I had to switch back to FF 3.6.  Now I can't access any internal intranet websites at work, and for me FF is the only browser on the Mac that handles Kerberos correctly.

my 2 cents.
Kris, are you're saying it worked with FF beta6 Mac 64bit? Or it worked in Mac 32bit?
If the former, could you please check which was the last build (ideally down to a day, using nightlies) that still worked, and which is the first build that broke? That would be very helpful in finding the bug.
it worked on 64-bit Mac for FF 4.0 beta 6 -- I assuming I have been using 64-bit mac all along, since that is my platform (snow leopard).  I had been using the FF 4.0 betas since the first one, and this is the first time it stopped working.  A week or two before beta 7 came out, I tried a nightly, and that crashed too, then it started crashing for beta 7 when that came out.
> I assuming I have been using
> 64-bit mac all along, since that is my platform (snow leopard)

Not a good assumption, see comment 3 / 4. We're in the transition from 32bit to 64bit builds of Mozilla. You may well have been using 32bit applications on a 64bit operating system (Mac OS X). Please check and answer comment 13.
I went and checked some of the nightly builds, and the regular mac (32-bit, I assume) work with Kerberos, while the ones labeled 64-bit do not.  I checked a few back as far as Sept 15, and the 64-bits all die for me.  

Sorry for any confusion.
ok. Thanks for checking.
I took another look at this today and zeroed minor status and major status, but they didn't help either. This is what I actually pushed to Thunderbird's try server: http://hg.mozilla.org/try-comm-central/rev/09a8f3c94b5a

Also, now we've released Alpha 1, we're seeing a few crashes, not sure if the stacks will help with anything:

http://crash-stats.mozilla.com/report/list?range_value=2&range_unit=weeks&signature=Kerberos%400x6a163&version=Thunderbird%3A3.3a1


Unfortunately I'm not getting the time to push this forward as I'd like. I'm aware this is a blocker, but I have other priorities as out, and I've not got 10.6 locally, so I'm not ideally placed to debug this anyway. Therefore I'm re-assigning this to nobody in the hopes someone can take up looking at it.

I'm still quite happy to push potential fixes to our try server to see if they fix the issue or not, and if I get time I'll pick it up, but I can't guarantee that at the moment.
Assignee: bugzilla → nobody
I should be able to try this out in the next few days.
I have what claims to be a 64 bit debug build on Snow Leopard, but the test does not fail. The "guessed" obj-dir is obj-x86_64-apple-darwin10.5.0 and configure says 64 bits yes. gdb is unhappy with the executable, claiming that the executable is of type i386:x86-64, but gdb was configured as x86_64-apple-darwin. I'm just using the default build options...so I'm not sure if it's just a problem with gdb, or if there's some problem with my build, which is making the test pass...
test passed with release build also.
David, did you comment out the bit of the test that currently stops it being run on Mac?
(In reply to comment #22)
> David, did you comment out the bit of the test that currently stops it being
> run on Mac?

Oh, thx, I did on my old mac, but not on the new one...
OK, I believe I've figured out the basic problem though I'm not sure what the code change will be. I looked at the gssapi.h file in Mac System Kerberos Framework and it has #pragma pack(push,2). If I add that and a corresponding #pragma pack(pop) to our own copy of gssapi.h, then we no longer crash. I don't know if that will work for all mac builds, or just the 64 bit builds, and I'm not sure what #ifdef to use for just 64 bit mac builds. I suppose I could try doing a 3.1.x build which won't be 64 bit and see what #ifdef works...
Attached patch proposed fixSplinter Review
this fixes the crash in our unit test w/ 64 bit builds. I need to check that the test still works on 32 bit builds, but I suspect that pragma mainly affects the 64 bit builds.
Assignee: nobody → bienvenu
Status: NEW → ASSIGNED
Comment on attachment 496522 [details] [diff] [review]
proposed fix

Our unit test passes on 32 bit mac builds. I can't find an owner for this module, so I've just picked kaie, who recently changed related files...Kaie, if you want someone else to review this, please let me know, thx!
Attachment #496522 - Flags: review?(kaie)
This change looks correct to me.

Given that we dynamically load our GSSAPI module, it will cause problems if we end up loading non-Apple GSSAPI libraries (such as a locally build MIT or Heimdal distribution). However, I think the situations where we end up doing so are likely to be pretty rare, so I think this change is a good immediate fix.
Comment on attachment 496522 [details] [diff] [review]
proposed fix

ah, cool, thx, asking Simon for review, then, since he wrote the code :-)

I don't think there's any way to win with gssapi libraries on the mac that use different packing, since the packing is a compile time setting.
Attachment #496522 - Flags: review?(kaie) → review?(simon)
Comment on attachment 496522 [details] [diff] [review]
proposed fix

The pragma code comes from what MIT provide to Apple, so I've posted to krbdev@mit.edu about this. I suspect it's also worth opening a radar, but I'm going to wait until MIT respond.

Of course, the problem is that if this goes away in a future Mac OS X release, we're going to end up revisiting this all over again.
Attachment #496522 - Flags: review?(simon) → review+
Comment on attachment 496522 [details] [diff] [review]
proposed fix

this fixes a crasher in both Firefox in Thunderbird for 64 bit mac builds.
Attachment #496522 - Flags: approval2.0?
This bug's status is blocking2.0 final+, so, seems you don't need the additional approval.
Comment on attachment 496522 [details] [diff] [review]
proposed fix

This has landed with a=blocking:

http://hg.mozilla.org/mozilla-central/rev/2705b22189f9
Attachment #496522 - Flags: approval2.0?
I have also backed out the change to comm-central that disabled the test that was previously crashing and the test now passes:

http://hg.mozilla.org/comm-central/rev/66ac1e4701b2

Hence marking bug as fixed.
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Whiteboard: [tb33needs] → [tb33needed]
Target Milestone: --- → mozilla2.0b8
thx, Mark, and Simon.
Crash Signature: [@ Kerberos@0x6a163 ]
You need to log in before you can comment on or make changes to this bug.