Closed
Bug 839677
Opened 12 years ago
Closed 12 years ago
(bad message queue pointer) Intermittent /tests/dom/media/tests/mochitest/test_peerConnection_basicAudio.html | Test timed out
Categories
(Core :: WebRTC: Signaling, defect, P1)
Tracking
()
RESOLVED
FIXED
mozilla22
People
(Reporter: abr, Assigned: abr)
References
Details
(Whiteboard: [WebRTC],[blocking-webrtc+] [qa-])
Attachments
(1 file, 1 obsolete file)
12.92 KB,
patch
|
jesup
:
review+
|
Details | Diff | Splinter Review |
philor
https://tbpl.mozilla.org/php/getParsedLog.php?id=19459762&tree=Mozilla-Inbound
Rev4 MacOSX Lion 10.7 mozilla-inbound opt test mochitest-2 on 2013-02-05 10:30:24
slave: talos-r4-lion-032
26010 ERROR TEST-UNEXPECTED-FAIL | /tests/dom/media/tests/mochitest/test_peerConnection_basicAudio.html | Test timed out.
Updated•12 years ago
|
Keywords: intermittent-failure
Updated•12 years ago
|
Summary: /tests/dom/media/tests/mochitest/test_peerConnection_basicAudio.html | Test timed out WITH NO CRASH → Intermittent /tests/dom/media/tests/mochitest/test_peerConnection_basicAudio.html | Test timed out WITH NO CRASH
Whiteboard: [WebRTC],[blocking-webrtc+]
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 3•12 years ago
|
||
Ms2ger%gmail.com
https://tbpl.mozilla.org/php/getParsedLog.php?id=19512848&tree=Mozilla-Inbound
Rev4 MacOSX Lion 10.7 mozilla-inbound debug test mochitest-2 on 2013-02-06 16:39:53
slave: talos-r4-lion-068
26008 ERROR TEST-UNEXPECTED-FAIL | /tests/dom/media/tests/mochitest/test_peerConnection_basicAudio.html | Test timed out.
Summary: Intermittent /tests/dom/media/tests/mochitest/test_peerConnection_basicAudio.html | Test timed out WITH NO CRASH → Intermittent /tests/dom/media/tests/mochitest/test_peerConnection_basicAudio.html | Test timed out
Assignee | ||
Updated•12 years ago
|
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 6•12 years ago
|
||
Any TBPL stars after this comment should contain useful logging information that isolates this problem to a smaller part of the system.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Updated•12 years ago
|
Priority: -- → P1
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 30•12 years ago
|
||
(In reply to Adam Roach [:abr] from comment #6)
> Any TBPL stars after this comment should contain useful logging information
> that isolates this problem to a smaller part of the system.
Has the logging provided any more insight? :-)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 32•12 years ago
|
||
(In reply to Ed Morley [:edmorley UTC+0] from comment #30)
> (In reply to Adam Roach [:abr] from comment #6)
> > Any TBPL stars after this comment should contain useful logging information
> > that isolates this problem to a smaller part of the system.
>
> Has the logging provided any more insight? :-)
It has, and I spent quite a bit of time on Friday doing analysis of the logs of good runs versus bad runs to try to nail down where things go wrong. I did manage to find a fairly consistent difference that I suspected was the problem; however, after doing work to make things happen in the order that appeared to yield success, I found that forcing the order that I thought would cause failure didn't actually cause failure.
The good news is that Bug 845523, now landed on m-c, will eliminate the ability for this set of events to occur in different orders. Hopefully, this will make the actual differences between successful and failure runs easier to find.
Believe me, I understand that this is annoying for the sheriffs, and getting rid of the intermittent oranges is on the top of my priority list.
Comment 33•12 years ago
|
||
Thank you for your work on this so far - much appreciated :-)
Assignee | ||
Comment 34•12 years ago
|
||
Okay, I think I see the problem now. It appears that the failure runs all show the CCApp thread getting on the CPU and starting to process messages before the GSM Task thread has CPU cycles at all. The model here is that the first thing each thread does is sets its inbound message queue. But since GSM hasn't run at all, its queue is still NULL. This means the CCApp->GSM message "SETPEERCONNECTION" is going to fail to be delivered.
The failure path is pretty self-evident from that point forward.
Rather than trying to synchronize start up further, I think the easy fix here is to initialize all the queues prior to starting any of the threads.
Assignee | ||
Comment 35•12 years ago
|
||
Assignee | ||
Comment 36•12 years ago
|
||
It turns out only the GSM Task queue copied the queue to a module-local variable. Everyone else uses the globals declared in init.c. The patch I just attached -- as of yet untested -- changes GSM to behave the same way, which should eliminate any possibility of some other thread attempting to enqueue a message to GSM before it's ready.
I'll be requesting review on the patch as soon as I determine that I haven't broken anything, hopefully later today (but before bugzilla goes down for the upgrade).
Assignee | ||
Comment 37•12 years ago
|
||
Assignee | ||
Updated•12 years ago
|
Attachment #721471 -
Attachment is obsolete: true
Assignee | ||
Comment 38•12 years ago
|
||
Comment on attachment 721493 [details] [diff] [review]
Remove problematic gsm_msg_queue and use gsm_msgq instead
Randell: This passes signaling_unittests and mochi tests on my local machine. Given the state of the try infrastructure, I'm not sure this kind of change warrants a try push. Let me know if you'd prefer to see a try run.
Attachment #721493 -
Flags: review?(rjesup)
Updated•12 years ago
|
Attachment #721493 -
Flags: review?(rjesup) → review+
Assignee | ||
Comment 39•12 years ago
|
||
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 42•12 years ago
|
||
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla22
Comment 43•12 years ago
|
||
Backed out for now while we investigate bug 848966. I'll re-land whatever comes up clean.
https://hg.mozilla.org/integration/mozilla-inbound/rev/cb432984d5ce
Updated•12 years ago
|
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 44•12 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/d7f59fd537d9
Windows PGO M2 is looking green.
Comment 45•12 years ago
|
||
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
Flags: in-testsuite+
Whiteboard: [WebRTC],[blocking-webrtc+] → [WebRTC],[blocking-webrtc+] [qa-]
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Updated•12 years ago
|
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 48•12 years ago
|
||
In studying the log for comment 47, it shows a very different pathology than the original source of the bug. To avoid confusion, I'm re-closing this bug and moving the new problem into Bug 853858.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Keywords: intermittent-failure
Resolution: --- → FIXED
Summary: Intermittent /tests/dom/media/tests/mochitest/test_peerConnection_basicAudio.html | Test timed out → (bad message queue pointer) Intermittent /tests/dom/media/tests/mochitest/test_peerConnection_basicAudio.html | Test timed out
You need to log in
before you can comment on or make changes to this bug.
Description
•