If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

SIPCC fails to start if it can't mkdir the directory it wants in /tmp

RESOLVED FIXED in mozilla24

Status

()

Core
WebRTC: Signaling
RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: abr, Assigned: abr)

Tracking

({intermittent-failure})

Trunk
mozilla24
x86
Mac OS X
intermittent-failure
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [WebRTC], [blocking-webrtc-] [qa-])

Attachments

(2 attachments, 1 obsolete attachment)

(Assignee)

Description

4 years ago
Created attachment 749281 [details]
Failure Log

When running repeated gUM/PC mochi tests under ASAN, jesup encountered an as-yet-unseen failure that ultimately resulted in a timeout in the "throwInCallbacks" test case (most likely because it was the last test run).

The telltale marker in this log is:

> [GSM Task|sdp_config] sdp_config.c:26: SDP: Invalid Config pointer.

This message can arise when the config object is NULL or uninitialized (which should cause emission of a message "sdp_init_config() failure") or when it has been corrupted by being overwritten (which would be worrisome).

This log message is also curious:

> [SIP MsgQueueWait task|ccsip] sip_platform_task.c:275: SIP : sip_platform_task_msgqwait : timeout waiting for listening IPC socket ready, exiting

It arises as a consequence of the following:

> [SIPStack task|ccsip] sip_platform_task.c:508: SIP : sip_platform_task_loop : failed to create temp dir

After such a failure, the sip_platform_task_loop returns, which prevents the IPC that the SIP MsgQueueWait expects to be set up from being set up.

Note that this message itself is an indication that mkdir("/tmp/SIP-%d",0700) failed (where %d is replaced with the PID). This is not a system call I would expect to fail under any but the most dire circumstances. I suspect the issue we're seeing here is collateral damage from corruption inflicted elsewhere in the code.

In any case, it's probably worth improving the output of these specific failure points to help shed light on this set of circumstances, should it arise again.

Updated

4 years ago
Keywords: intermittent-failure
(Assignee)

Comment 1

4 years ago
Created attachment 749315 [details] [diff] [review]
Enhance failure logging
(Assignee)

Updated

4 years ago
Assignee: nobody → adam
(Assignee)

Updated

4 years ago
Attachment #749315 - Flags: review?(ethanhugg)
Note: IIRC this was not ASAN and not part of a long sequence of mochitests; it was part of a non-ASAN last-minute check of mochitests before putting something up for review, IIRC.  The problem did not repeat.

The comment about mkdir is likely the cause - I've noticed failed/crashed/^c'd (I think) sessions leaving /tmp/SIP-* dirs, which can cause mkdir to fail.  If we're not using a mktmp/etc type thing, we need to handle collisions....

Comment 3

4 years ago
Comment on attachment 749315 [details] [diff] [review]
Enhance failure logging

Review of attachment 749315 [details] [diff] [review]:
-----------------------------------------------------------------

::: media/webrtc/signaling/src/sipcc/core/sipstack/sip_platform_task.c
@@ +506,5 @@
>        PR_snprintf(stmpdir, sizeof(stmpdir), template, getpid());
>  
>        if (mkdir(stmpdir, 0700) != 0) {
> +          CCSIP_DEBUG_ERROR(SIP_F_PREFIX"failed to create temp dir: %s (%s)",
> +                            fname, stmpdir, strerror(errno));

You may want to run a Try for Windows because this is the first instance of strerror in the signaling code that will be compiled on Windows.  I know Windows compilers try to get you to use strerror_s, but not sure if it's just a warning or error they throw.
Attachment #749315 - Flags: review?(ethanhugg) → review+
(Assignee)

Comment 4

4 years ago
(In reply to Ethan Hugg [:ehugg] from comment #3)
> You may want to run a Try for Windows because this is the first instance of
> strerror in the signaling code that will be compiled on Windows.

Good catch. Here's the try push:

https://tbpl.mozilla.org/?tree=Try&rev=108061b1a44e
(Assignee)

Comment 5

4 years ago
(In reply to Randell Jesup [:jesup] from comment #2)
> Note: IIRC this was not ASAN and not part of a long sequence of mochitests;
> it was part of a non-ASAN last-minute check of mochitests before putting
> something up for review, IIRC.  The problem did not repeat.
> 
> The comment about mkdir is likely the cause - I've noticed
> failed/crashed/^c'd (I think) sessions leaving /tmp/SIP-* dirs, which can
> cause mkdir to fail.  If we're not using a mktmp/etc type thing, we need to
> handle collisions....

Yep, that's almost certainly the root cause here. The item in question is supposed to be initialized in the sip_platform_task_loop, but it happens after the return-due-to-failed-mkdir happens.

I'd like to land the diagnostic patch anyway, but I'll put together something that deals more gracefully with this situation.

Updated

4 years ago
Summary: Intermittent test_peerConnection_throwInCallbacks.html | Test timed out after "SDP: Invalid Config pointer." → SIPCC fails to start if it can't mkdir the directory it wants in /tmp
(Assignee)

Comment 6

4 years ago
Created attachment 749515 [details] [diff] [review]
Use more robust mkdtemp() rather than relying on PID
(Assignee)

Updated

4 years ago
Attachment #749315 - Attachment is obsolete: true
(Assignee)

Comment 7

4 years ago
Comment on attachment 749515 [details] [diff] [review]
Use more robust mkdtemp() rather than relying on PID

I have an r+ from ehugg on the logging (and the push try shows that things work just fine under Windows) -- I just need a review on the actual fix, which migrates from using the PID for creating a unique tmpdir name to using the POSIX-standard mkdtemp().
Attachment #749515 - Flags: review?(rjesup)

Updated

4 years ago
Attachment #749515 - Flags: review?(rjesup) → review+
(Assignee)

Comment 8

4 years ago
I'm pushing this one to try also (cross-platform this time) just to make sure we don't get any unexpected behavior from mkdtemp(). It might be POSIX, but that doesn't mean everyone got it right...

https://tbpl.mozilla.org/?tree=Try&rev=80d6726a15d8
(Assignee)

Comment 9

4 years ago
Nice and green all across the board. Landing.

https://hg.mozilla.org/integration/mozilla-inbound/rev/f5e13f8aa8cf
https://hg.mozilla.org/mozilla-central/rev/f5e13f8aa8cf
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla24

Updated

4 years ago
Whiteboard: [WebRTC], [blocking-webrtc-] → [WebRTC], [blocking-webrtc-] [qa-]
You need to log in before you can comment on or make changes to this bug.