Last Comment Bug 710466 - sasl-browserid causing segfault in slapd
: sasl-browserid causing segfault in slapd
Product: Participation Infrastructure
Classification: Other
Component: Phonebook (show other bugs)
: other
: x86 Linux
-- normal (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
Depends on:
Blocks: 665373
  Show dependency treegraph
Reported: 2011-12-13 16:53 PST by Austin King [:ozten]
Modified: 2012-05-09 13:11 PDT (History)
5 users (show)
See Also:
QA Whiteboard:
Iteration: ---
Points: ---

valgrind output for kill segfault (522.43 KB, text/plain)
2011-12-14 14:29 PST, Austin King [:ozten]
no flags Details

Description User image Austin King [:ozten] 2011-12-13 16:53:14 PST
slapd will occasionally segfault. This has been happening since 11-23 when we deployed to

During development, I noticed slapd will segfault for:
* Being run in the forground and a SIG INT sent (Cntl-C)
* If a file it relies on is changed, such as recompiling sasl-browserid and installing
Comment 1 User image Austin King [:ozten] 2011-12-13 17:14:03 PST
Cleaned up syslogs of segfaults from today only.

I don't see anything interesting, but syslog isn't set to debug...
Comment 2 User image David Chan [:dchan] 2011-12-14 10:23:09 PST
The segfault looks similar to the one I received during testing.

Dec 13 16:29:18 mozillians1 kernel: slapd[14391]: segfault at 7f7d4a6d6680 ip 00007f7d4b1eba32 sp 00007fff95e17f78 error 4 in[7f7d4b16c000+186000]

In my debugging the issue appeared to be from passing NULL to strlen() or a function that calls strlen(). The last gdb stack frame I got was for an optimized length intrinsic. Let me see if I can get the stack trace.

If this is the case, the SASL library may be expecting the plugin to perform certain error checking.
Comment 3 User image Austin King [:ozten] 2011-12-14 10:53:02 PST
(In reply to David Chan [:dchan] from comment #2)

Do you recall the repro steps to cause a NULL input?

I can bulletproof the code, but it would be great to repro before doing so.
Comment 4 User image Austin King [:ozten] 2011-12-14 14:29:49 PST
Created attachment 581788 [details]
valgrind output for kill segfault

The attached valgrind output is created by:

valgrind -v --leak-check=full --show-reachable=yes slapd -d 64 -f slapd.conf -h ldap://:1389

then sending `kill $pid`.

This doesn't repro the mozillians-dev issue, but it is a known segfault mentioned earlier (Cntl-C).
Comment 5 User image Austin King [:ozten] 2011-12-15 09:50:45 PST
Unable to repro with 30 concurrent requests of same assertion, same stale assertion, and same garbage assertion.

Still digging.
Comment 6 User image Austin King [:ozten] 2011-12-15 15:05:48 PST
I have access to mozillians-dev master slapd server.

The segfaults look like shutdown segfaults, which llyod has identified a fix for in

I'm now looking for patterns to explain why slapd was shutdown manually or via automated scripts (instead of web requests causing segfault).

On Rackspace:
We sent 1000~ concurrent requests with unique assertions/emails/password and couldn't repro segfault.
Comment 7 User image Austin King [:ozten] 2011-12-19 14:24:10 PST
IT and I are pretty confident that:

1) There is a compatibility issue with sasl-browserid and doing replication over start-tls.
2) Segfaults were on server shutdown or restart, not serving traffic.

#2 has been fixed and 1 will be fixed outside of this bug. There is a work-around in place for mozillians-dev for #1.
Comment 8 User image Matt Brandt [:mbrandt] 2012-05-09 13:11:22 PDT
Bumping to verified per the passage of time, the [qa-] nature of the bug, and comment 7.

Note You need to log in before you can comment on or make changes to this bug.