Closed Bug 872013 Opened 12 years ago Closed 12 years ago

SIPCC fails to start if it can't mkdir the directory it wants in /tmp

Categories

(Core :: WebRTC: Signaling, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla24

People

(Reporter: abr, Assigned: abr)

Details

(Keywords: intermittent-failure, Whiteboard: [WebRTC], [blocking-webrtc-] [qa-])

Attachments

(2 files, 1 obsolete file)

Attached file Failure Log
When running repeated gUM/PC mochi tests under ASAN, jesup encountered an as-yet-unseen failure that ultimately resulted in a timeout in the "throwInCallbacks" test case (most likely because it was the last test run).

The telltale marker in this log is:

> [GSM Task|sdp_config] sdp_config.c:26: SDP: Invalid Config pointer.

This message can arise when the config object is NULL or uninitialized (which should cause emission of a message "sdp_init_config() failure") or when it has been corrupted by being overwritten (which would be worrisome).

This log message is also curious:

> [SIP MsgQueueWait task|ccsip] sip_platform_task.c:275: SIP : sip_platform_task_msgqwait : timeout waiting for listening IPC socket ready, exiting

It arises as a consequence of the following:

> [SIPStack task|ccsip] sip_platform_task.c:508: SIP : sip_platform_task_loop : failed to create temp dir

After such a failure, the sip_platform_task_loop returns, which prevents the IPC that the SIP MsgQueueWait expects to be set up from being set up.

Note that this message itself is an indication that mkdir("/tmp/SIP-%d",0700) failed (where %d is replaced with the PID). This is not a system call I would expect to fail under any but the most dire circumstances. I suspect the issue we're seeing here is collateral damage from corruption inflicted elsewhere in the code.

In any case, it's probably worth improving the output of these specific failure points to help shed light on this set of circumstances, should it arise again.
Attached patch Enhance failure logging (obsolete) — Splinter Review
Assignee: nobody → adam
Attachment #749315 - Flags: review?(ethanhugg)
Note: IIRC this was not ASAN and not part of a long sequence of mochitests; it was part of a non-ASAN last-minute check of mochitests before putting something up for review, IIRC.  The problem did not repeat.

The comment about mkdir is likely the cause - I've noticed failed/crashed/^c'd (I think) sessions leaving /tmp/SIP-* dirs, which can cause mkdir to fail.  If we're not using a mktmp/etc type thing, we need to handle collisions....
Comment on attachment 749315 [details] [diff] [review]
Enhance failure logging

Review of attachment 749315 [details] [diff] [review]:
-----------------------------------------------------------------

::: media/webrtc/signaling/src/sipcc/core/sipstack/sip_platform_task.c
@@ +506,5 @@
>        PR_snprintf(stmpdir, sizeof(stmpdir), template, getpid());
>  
>        if (mkdir(stmpdir, 0700) != 0) {
> +          CCSIP_DEBUG_ERROR(SIP_F_PREFIX"failed to create temp dir: %s (%s)",
> +                            fname, stmpdir, strerror(errno));

You may want to run a Try for Windows because this is the first instance of strerror in the signaling code that will be compiled on Windows.  I know Windows compilers try to get you to use strerror_s, but not sure if it's just a warning or error they throw.
Attachment #749315 - Flags: review?(ethanhugg) → review+
(In reply to Ethan Hugg [:ehugg] from comment #3)
> You may want to run a Try for Windows because this is the first instance of
> strerror in the signaling code that will be compiled on Windows.

Good catch. Here's the try push:

https://tbpl.mozilla.org/?tree=Try&rev=108061b1a44e
(In reply to Randell Jesup [:jesup] from comment #2)
> Note: IIRC this was not ASAN and not part of a long sequence of mochitests;
> it was part of a non-ASAN last-minute check of mochitests before putting
> something up for review, IIRC.  The problem did not repeat.
> 
> The comment about mkdir is likely the cause - I've noticed
> failed/crashed/^c'd (I think) sessions leaving /tmp/SIP-* dirs, which can
> cause mkdir to fail.  If we're not using a mktmp/etc type thing, we need to
> handle collisions....

Yep, that's almost certainly the root cause here. The item in question is supposed to be initialized in the sip_platform_task_loop, but it happens after the return-due-to-failed-mkdir happens.

I'd like to land the diagnostic patch anyway, but I'll put together something that deals more gracefully with this situation.
Summary: Intermittent test_peerConnection_throwInCallbacks.html | Test timed out after "SDP: Invalid Config pointer." → SIPCC fails to start if it can't mkdir the directory it wants in /tmp
Attachment #749315 - Attachment is obsolete: true
Comment on attachment 749515 [details] [diff] [review]
Use more robust mkdtemp() rather than relying on PID

I have an r+ from ehugg on the logging (and the push try shows that things work just fine under Windows) -- I just need a review on the actual fix, which migrates from using the PID for creating a unique tmpdir name to using the POSIX-standard mkdtemp().
Attachment #749515 - Flags: review?(rjesup)
Attachment #749515 - Flags: review?(rjesup) → review+
I'm pushing this one to try also (cross-platform this time) just to make sure we don't get any unexpected behavior from mkdtemp(). It might be POSIX, but that doesn't mean everyone got it right...

https://tbpl.mozilla.org/?tree=Try&rev=80d6726a15d8
Nice and green all across the board. Landing.

https://hg.mozilla.org/integration/mozilla-inbound/rev/f5e13f8aa8cf
https://hg.mozilla.org/mozilla-central/rev/f5e13f8aa8cf
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla24
Whiteboard: [WebRTC], [blocking-webrtc-] → [WebRTC], [blocking-webrtc-] [qa-]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: