Closed
Bug 870002
Opened 12 years ago
Closed 11 years ago
Intermittent test_peerConnection_basicAudioVideo.html,test_peerConnection_basicAudioVideoCombined.html,test_peerConnection_throwInCallbacks.html | Exited with code -2147483645 during test run | application crashed [Unknown top frame]
Categories
(Core :: WebRTC: Audio/Video, defect)
Tracking
()
People
(Reporter: RyanVM, Assigned: roc)
References
Details
(Keywords: crash, intermittent-failure, regression, Whiteboard: [WebRTC][blocking-webrtc-][leave-open][qa-automation-blocked][webrtc-uplift])
Attachments
(3 files, 1 obsolete file)
1.26 KB,
patch
|
philor
:
review+
|
Details | Diff | Splinter Review |
17.65 KB,
patch
|
roc
:
review+
|
Details | Diff | Splinter Review |
1.19 KB,
patch
|
ehsan.akhgari
:
review+
|
Details | Diff | Splinter Review |
Maybe related to this push?
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=c9737a4136cf
https://tbpl.mozilla.org/php/getParsedLog.php?id=22727216&tree=Mozilla-Inbound
Rev3 WINNT 6.1 mozilla-inbound opt test mochitest-3 on 2013-05-08 05:04:13 PDT for push 5971dba36391
slave: talos-r3-w7-107
05:21:48 INFO - 18171 INFO TEST-INFO | /tests/dom/media/tests/mochitest/test_peerConnection_basicAudioVideo.html | Got media stream: audio (local)
05:21:48 INFO - 18172 INFO TEST-INFO | /tests/dom/media/tests/mochitest/test_peerConnection_basicAudioVideo.html | Call getUserMedia for {"video":true,"fake":true}
05:21:48 INFO - 0[ae82080]: [CCAPP Task|def] ccapi.c:1161: SIPCC-CC_API: 1/4, cc_int_feature2: UI -> GSM: ADDSTREAM
05:21:48 INFO - 0[ae821d0]: [GSM Task|def] dcsm.c:532: SIPCC-DCSM: dcsm_process_event: DCSM 23 :(DCSM_READY:ADDSTREAM )
05:21:48 INFO - 0[ae821d0]: [GSM Task|fsm_sm] sm.c:46: SIPCC-FSM: sm_process_event: DEF 4 : 6C281E04x: sm entry: (IDLE:ADDSTREAM)
05:21:48 INFO - 0[ae821d0]: [GSM Task|fsm_sm] fsmdef.c:3535: SIPCC-FSM: fsmdef_ev_addstream: Entered.
05:21:48 INFO - 0[ae821d0]: [GSM Task|def] sm.c:65: SIPCC-GSM: 1/4, sm_process_event: DEF :(IDLE:ADDSTREAM )
05:23:05 WARNING - TEST-UNEXPECTED-FAIL | /tests/dom/media/tests/mochitest/test_peerConnection_basicAudioVideo.html | Exited with code -2147483645 during test run
05:23:07 INFO - INFO | automation.py | Application ran for: 0:15:26.393000
05:23:07 INFO - INFO | zombiecheck | Reading PID log: c:\users\cltbld\appdata\local\temp\tmpsmm_wbpidlog
05:23:07 INFO - ==> process 1540 launched child process 956
05:23:07 INFO - ==> process 1540 launched child process 3564
05:23:07 INFO - ==> process 1540 launched child process 404
05:23:07 INFO - ==> process 1540 launched child process 976
05:23:07 INFO - INFO | zombiecheck | Checking for orphan process with PID: 956
05:23:07 INFO - INFO | zombiecheck | Checking for orphan process with PID: 3564
05:23:07 INFO - INFO | zombiecheck | Checking for orphan process with PID: 404
05:23:07 INFO - INFO | zombiecheck | Checking for orphan process with PID: 976
05:23:07 INFO - mozcrash INFO | Downloading symbols from: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-win32/1368011389/firefox-23.0a1.en-US.win32.crashreporter-symbols.zip
05:23:08 INFO - Downloading symbols from: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-win32/1368011389/firefox-23.0a1.en-US.win32.crashreporter-symbols.zip
05:23:21 WARNING - PROCESS-CRASH | /tests/dom/media/tests/mochitest/test_peerConnection_basicAudioVideo.html | application crashed [Unknown top frame]
Reporter | ||
Comment 1•12 years ago
|
||
Reporter | ||
Updated•12 years ago
|
Summary: Intermittent test_peerConnection_basicAudioVideo.html | Exited with code -2147483645 during test run | application crashed [Unknown top frame] → Intermittent test_peerConnection_basicAudioVideo.html,test_peerConnection_basicAudioVideoCombined.html | Exited with code -2147483645 during test run | application crashed [Unknown top frame]
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Reporter | ||
Comment 9•12 years ago
|
||
Summary: Intermittent test_peerConnection_basicAudioVideo.html,test_peerConnection_basicAudioVideoCombined.html | Exited with code -2147483645 during test run | application crashed [Unknown top frame] → Intermittent test_peerConnection_basicAudioVideo.html,test_peerConnection_basicAudioVideoCombined.html,test_peerConnection_throwInCallbacks.html | Exited with code -2147483645 during test run | application crashed [Unknown top frame]
Comment 10•12 years ago
|
||
Comment 11•12 years ago
|
||
Comment 12•12 years ago
|
||
Updated•12 years ago
|
Attachment #747282 -
Attachment is obsolete: true
Comment 13•12 years ago
|
||
Comment on attachment 747287 [details] [diff] [review]
move data-processing debugs in MSG to level 5 to allow granular logging
This is just a patch to move a bunch of high-volume debugs in MSG to level 5 from 4 (PR_LOG_DEBUG), so we can turn on debugging of things like adding tracks in automation without generating 20MB+ log files. The dom/media/tests/mochitests logs drop from 22MB to 800K with mediastreamgraph:4 logging (500K without any MSG logging)
For this sort of patch (just changing debug levels) I'll take whomever can review first.
We could locally define in MediaStreamGraph.h a LOG_MSG_DETAILS or some such and use that instead of PR_LOG_DEBUG+1; I don't really care either way.
Attachment #747287 -
Flags: review?(tterribe)
Attachment #747287 -
Flags: review?(roc)
Attachment #747287 -
Flags: review?(paul)
Attachment #747287 -
Flags: review?(ehsan)
Attachment #747287 -
Flags: review?(cpearce)
Attachment #747287 -
Flags: review?(adam)
Assignee | ||
Updated•12 years ago
|
Attachment #747287 -
Flags: review?(roc) → review+
Comment 14•12 years ago
|
||
Comment on attachment 747283 [details] [diff] [review]
enable MediaStreamGraph logging to try to hunt down bug 870002
Once the other bug here to change debugs in MSG is approved, this will turn on more "what's going on" debugging for MSG without blowing up the logs (a few hundred K more roughly). I'll take whomever feels they can r+ this. This is intended to be backed out as soon as we've figured out bug 870002 (or decided this debug change doesn't help find it).
Attachment #747283 -
Flags: review?(ted)
Attachment #747283 -
Flags: review?(ryanvm)
Attachment #747283 -
Flags: review?(philringnalda)
Attachment #747283 -
Flags: review?(emorley)
Updated•12 years ago
|
Whiteboard: [leave-open]
Updated•12 years ago
|
Attachment #747283 -
Flags: review?(ted)
Attachment #747283 -
Flags: review?(ryanvm)
Attachment #747283 -
Flags: review?(philringnalda)
Attachment #747283 -
Flags: review?(emorley)
Attachment #747283 -
Flags: review+
Updated•12 years ago
|
Attachment #747287 -
Flags: review?(tterribe)
Attachment #747287 -
Flags: review?(paul)
Attachment #747287 -
Flags: review?(ehsan)
Attachment #747287 -
Flags: review?(cpearce)
Attachment #747287 -
Flags: review?(adam)
Comment 15•12 years ago
|
||
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 20•12 years ago
|
||
Ok, we got a hit on the retriggers!
So we see this sequence from MSG right before the crash:
09:01:58 INFO - 3288[f8654c8]: Adding media stream 1301ea30 to the graph
09:01:58 INFO - 3288[f8654c8]: Adding media stream 12e9d1e0 to the graph
09:01:58 INFO - 3288[f8654c8]: Adding MediaInputPort 1620d580 (from 12e9d1e0 to 1301ea30) to the graph
09:01:58 INFO - 3288[f8654c8]: SourceMediaStream 12e9d1e0 creating track 1, rate 1000000, start 0, initial end 33333
Normally in all the other instances above in the log, it has this line following it:
09:01:58 INFO - 3288[f8654c8]: TrackUnionStream 1301e0c8 adding track 1 for input stream c401870 track 1, start ticks 0
Roc: any ideas? Debugs/asserts to add?
but that's missing here.
Flags: needinfo?(roc)
Reporter | ||
Comment 21•12 years ago
|
||
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 23•12 years ago
|
||
Hmmmm. The hit from m-c has the missing TrackUnion line. :-(
Back to "any debugs/asserts we can add?" roc? abr/ehugg? Anything stick out to you aboud where it's failing?
Updated•12 years ago
|
Whiteboard: [leave-open] → [WebRTC][blocking-webrtc?][leave-open]
Assignee | ||
Comment 24•12 years ago
|
||
Can we get the minidumps being produced by these crashes? That might help...
Flags: needinfo?(roc)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 27•12 years ago
|
||
The minidumps are 0 bytes.... :-(
ted jesup: either that or figure out how to get windows to generate a minidump for you, and disable breakpad
ted since windows does it out-of-process
ted jesup: http://msdn.microsoft.com/en-us/library/windows/desktop/bb787181%28v=vs.85%29.aspx
ted maybe take a slave out of service, configure that, run the test repeatedly till it fails?
ted running under a debugger might change the failure mode
jesup ted: do the slaves run one mochitest at a time, or multiple?
ted multiple
catlee-buildduty in sequence
ted we split the run into 5 chunks, each test run runs all the tests in that chunk
jesup So we may need to emulate that to produce the timings needed to force the failure
ted you can just take a build that has displayed the failure and run the same chunk
ted file a bug to get a test slave set aside
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 35•12 years ago
|
||
Added dependencies to possible sources
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Updated•12 years ago
|
tracking-firefox22:
--- → ?
tracking-firefox23:
--- → ?
Updated•12 years ago
|
status-firefox22:
--- → affected
status-firefox23:
--- → affected
Comment 43•12 years ago
|
||
I'm going to guess the tracking nom here means there's agreement this is a blocker as a crash regression.
Whiteboard: [WebRTC][blocking-webrtc?][leave-open] → [WebRTC][blocking-webrtc+][leave-open]
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Updated•12 years ago
|
Assignee: nobody → roc
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 54•12 years ago
|
||
I ran 200 iterations of the dom/media mochitests on my Windows laptop, with no failures.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Reporter | ||
Comment 58•12 years ago
|
||
(In reply to TinderboxPushlog Robot from comment #57)
> RyanVM
> https://tbpl.mozilla.org/php/getParsedLog.php?id=22925320&tree=Mozilla-Beta
> Rev3 WINNT 6.1 mozilla-beta pgo test mochitest-3 on 2013-05-13 19:29:15
> slave: talos-r3-w7-051
>
> 19:34:15 WARNING - TEST-UNEXPECTED-FAIL |
> /tests/dom/media/tests/mochitest/test_peerConnection_basicAudioVideoCombined.
> html | Exited with code -2147483645 during test run
> 19:34:33 WARNING - PROCESS-CRASH |
> /tests/dom/media/tests/mochitest/test_peerConnection_basicAudioVideoCombined.
> html | application crashed [Unknown top frame]
> 19:34:39 ERROR - Return code: 1
Looks like bug 866514 is indeed at fault. "Yay"
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 61•12 years ago
|
||
Good. Flagging the regressing bug then.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 69•12 years ago
|
||
Ted, is there anything we can do to diagnose the empty minidump? Maybe more diagnostics in the code that creates the minidump?
Flags: needinfo?(ted)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 71•12 years ago
|
||
We simply call into a Microsoft library function: MinidumpWriteDump. The most common cause for an empty dump is running out of virtual memory, whether due to actual exhaustion or fragmentation. If you'd like to print out memory stats right after we write the minidump (or fail to), we already gather some to send with the crash report, you could put some logging statements here:
http://mxr.mozilla.org/mozilla-central/source/toolkit/crashreporter/nsExceptionHandler.cpp#551
Flags: needinfo?(ted)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Updated•12 years ago
|
Assignee | ||
Comment 73•12 years ago
|
||
I filed bug 872786 on gathering more information when minidump collection fails.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 75•12 years ago
|
||
In https://tbpl.mozilla.org/?tree=Try&rev=da5eabb5aafe I have a try push with my patch for bug 872786, to try to gather more data when minidump creation fails. I'll try retriggering this test to see if we can collect some useful data there.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 77•12 years ago
|
||
I retriggered roc's try a bunch more times, and got a hit:
https://tbpl.mozilla.org/php/getParsedLog.php?id=23021195&tree=Try&full=1
00:43:17 INFO - out of memory: 0x0000000000070800 bytes requested
00:43:18 INFO - Minidump creation for thread 328 failed with GetLastError() -2147024865!
00:43:18 INFO - * EXCEPTION_RECORD Code=80000003 Flags=0 Address=7329113f Information[0]=0 Information[1]=-2067537872 Information[2]=3
00:43:18 INFO - * CONTEXT Eax=0 Ebx=0 Ecx=72933896 Edx=3 Esi=728e1ec6 Edi=7293379c Ebp=11e2f6a0 Esp=11e2f698 Eip=7329113f EFlags=202 SegCs=1b SegSs=23 SegDs=23 SegEs=23 SegFs=3b SegGs=0
00:43:18 INFO - * Memory at 732910bf:
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 79•12 years ago
|
||
There are two failures. Both crashed at the same address with the same OOM message.
http://social.msdn.microsoft.com/Forums/en-US/vcgeneral/thread/2f458521-4315-4295-9c85-336d693d55cc describes the same error when calling MiniDumpWriteDump, but apart from mentioning the memory allocation issue, does not help.
I wonder what generates that "out of memory: 0x0000000000070800 bytes requested" message.
Assignee | ||
Comment 80•12 years ago
|
||
Oh, that message comes from mozalloc_handle_oom.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 82•12 years ago
|
||
Running the tests locally, 0x70800 is from allocating a PlanarYCbCrImage --- 640*480*1.5 bytes per pixel. The allocation is made infallibly, which is probably a mistake.
This suggests maybe we're leaking temporarily, or something.
Assignee | ||
Comment 83•12 years ago
|
||
I meed to sleep now, but I want to look into the patch in bug 866514 and see if the media streams are being cleaned up properly. If we were temporarily leaking MediaStreams and their cached video frames, but cleaning them up on shutdown, that might cause this,
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 86•12 years ago
|
||
Right, I suspect the actual OOM point of failure isn't very interesting here, it's just whatever sucker tries to allocate memory at that point. The issue is "what's actually eating up all our memory".
Comment 87•12 years ago
|
||
"thread 328" is intriguing... Why so many? Something not getting cleaned up?
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 90•12 years ago
|
||
Interestingly, bug 866514 (or some other change around there) has made us clean up MediaStreams *earlier* when I just run the dom/media tests. Which doesn't help explain this bug at all.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 93•12 years ago
|
||
Bug 872996 looks like this bug. However, in bug 872996 I would not expect the code changed in bug 866514 to have run yet. Very mysterious :-(.
Assignee | ||
Comment 94•12 years ago
|
||
We could try backing out 866514 and relanding it one little piece at a time. I don't have any better ideas.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 96•12 years ago
|
||
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #94)
> We could try backing out 866514 and relanding it one little piece at a time.
> I don't have any better ideas.
We could try the Microsoft AppVerifier - Ethan? How easy would it be to try running it on a mochitest set?
Ted: I assume the out-of-memory could be some type of heap corruption?
Do we have in-tbpl ASAN mochitest runs for mac/linux at all? Could we retrigger them a bunch of times, or if we don't, could we do an ASAN Try build and retrigger?
Try's are known to hit it (if retriggered enough) so we can submit Trys with different pieces landed and then use that to bisect the patch. Weekend is coming and infra is more lightly loaded :-)
Flags: needinfo?(ted)
Flags: needinfo?(ethanhugg)
Comment 97•12 years ago
|
||
(In reply to Randell Jesup [:jesup] from comment #96)
> Ted: I assume the out-of-memory could be some type of heap corruption?
> Do we have in-tbpl ASAN mochitest runs for mac/linux at all? Could we
> retrigger them a bunch of times, or if we don't, could we do an ASAN Try
> build and retrigger?
We do not have them on TBPL, but you can run them on Try. I don't know how well it works on Mochitests.
Flags: needinfo?(ted)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 99•12 years ago
|
||
(In reply to Randell Jesup [:jesup] from comment #96)
> We could try the Microsoft AppVerifier - Ethan? How easy would it be to try
> running it on a mochitest set?
>
I will try this on AppVerif today. I haven't run the mochitests on AppVerif yet, only the unittests and the by-hand demos.
Flags: needinfo?(ethanhugg)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 102•12 years ago
|
||
I did not find the smoking gun I was looking for but I thought I'd document some AppVerif results here.
These will happen with any page that uses a PeerConnection with default AppVerif checks.
spl_init.c:106
LOCK: EnterCriticalSection() called on Unititialized CS.
The CS is actually initialized by hand statically three lines earlier in the file. AppVerif complains because InitializeCriticalSection() was not called.
rw_lock_win.cc:55
SRWLOCK: AcquireLockShared() fails on PC shutdown, perhaps lock already destroyed.
Get this several times when navigating away from a page that uses a peer connection
Stack:
vrfcore.dll!_VerifierStopMessageEx() Unknown
vfbasics.dll!_AVrfpVerifySRWLockAcquire@12() Unknown
vfbasics.dll!_AVrfpRtlAcquireSRWLockShared@4() Unknown
> xul.dll!webrtc::RWLockWin::AcquireLockShared() Line 56 C++
xul.dll!webrtc::voe::ChannelManagerBase::GetItem(int itemId) Line 158 C++
xul.dll!webrtc::voe::ChannelManager::GetChannel(const int channelId) Line 77 C++
xul.dll!webrtc::voe::ScopedChannel::ScopedChannel(webrtc::voe::ChannelManager & chManager, int channelId) Line 111 C++
xul.dll!webrtc::VoEBaseImpl::StopPlayout() Line 1471 C++
xul.dll!webrtc::VoEBaseImpl::StopPlayout(int channel) Line 1113 C++
xul.dll!mozilla::WebrtcAudioConduit::~WebrtcAudioConduit() Line 102 C++
xul.dll!mozilla::WebrtcAudioConduit::`scalar deleting destructor'(unsigned int) C++
xul.dll!mozilla::MediaSessionConduit::Release() Line 139 C++
xul.dll!mozilla::RefPtr<mozilla::MediaSessionConduit>::unref(mozilla::MediaSessionConduit * t) Line 172 C++
xul.dll!mozilla::RefPtr<mozilla::MediaSessionConduit>::~RefPtr<mozilla::MediaSessionConduit>() Line 121 C++
xul.dll!mozilla::ConduitDeleteEvent::~ConduitDeleteEvent() C++
xul.dll!mozilla::ConduitDeleteEvent::`scalar deleting destructor'(unsigned int) C++
xul.dll!nsRunnable::Release() Line 31 C++
xul.dll!nsCOMPtr<nsIRunnable>::~nsCOMPtr<nsIRunnable>() Line 523 C++
xul.dll!nsThread::ProcessNextEvent(bool mayWait, bool * result) Line 635 C++
xul.dll!NS_ProcessNextEvent(nsIThread * thread, bool mayWait) Line 238 C++
xul.dll!mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate * aDelegate) Line 82 C++
xul.dll!MessageLoop::RunInternal() Line 220 C++
xul.dll!MessageLoop::RunHandler() Line 213 C++
xul.dll!MessageLoop::Run() Line 187 C++
xul.dll!nsBaseAppShell::Run() Line 165 C++
xul.dll!nsAppShell::Run() Line 113 C++
xul.dll!nsAppStartup::Run() Line 269 C++
xul.dll!XREMain::XRE_mainRun() Line 3877 C++
xul.dll!XREMain::XRE_main(int argc, char * * argv, const nsXREAppData * aAppData) Line 3944 C++
xul.dll!XRE_main(int argc, char * * argv, const nsXREAppData * aAppData, unsigned int aFlags) Line 4145 C++
firefox.exe!do_main(int argc, char * * argv, nsIFile * xreDirectory) Line 272 C++
firefox.exe!NS_internal_main(int argc, char * * argv) Line 632 C++
firefox.exe!wmain(int argc, wchar_t * * argv) Line 105 C++
firefox.exe!__tmainCRTStartup() Line 533 C
firefox.exe!wmainCRTStartup() Line 377 C
kernel32.dll!@BaseThreadInitThunk@12() Unknown
ntdll.dll!___RtlUserThreadStart@8() Unknown
ntdll.dll!__RtlUserThreadStart@8() Unknown
If I turn of lock and srwlock checking I don't get errors. AppVerif has caught heap errors like use-after-free in Firefox for me before.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 111•12 years ago
|
||
Suggestions from bsmedberg:
[12:34] bsmedberg jesup: well can you dump about:memory to the testing log at the beginning of this/these tests?
[12:34] jesup I'm pretty sure we can
[12:36] bsmedberg What we really want is external crash reporting, but that's not a simple project
[12:38] jesup yeah. We need to find some way to solve this in the next week or so, which rules that out
[12:38] bsmedberg jesup: you could also try hacking the tests so that it disables the crash reporter and launches the process using procdump
[12:38] bsmedberg The test harness has changed enough that I don't know where we do that stuff nowadays.
[12:39] jesup ok; I don't know what's involved with that, but I can probably ping ted to help with that
ted: can you help with his suggestions? (either/both) Roc is at a work-week an taiwan, so will only be iffily available; I'll help as much as I can put together try runs (probably based on roc's try that found the OOM issue) and retrigger/etc, and analyze anything we can get from them.
Flags: needinfo?(ted)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 113•12 years ago
|
||
I'm in SFO this week, so timezones are not fantastic and I don't have my full complement of machines, but I'll see if I can figure something out here.
I don't think external crash reporting is really going to help us, we've determined that this is just "crashing on OOM". What we really need to find out is *what* is eating the memory.
Flags: needinfo?(ted)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 124•12 years ago
|
||
Could this somehow be related to bug 837835?
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 126•12 years ago
|
||
I hacked up some code to dump about:memory from a Mochitest:
http://pastebin.mozilla.org/2432760
It's terrible, but it seems to work.
(In reply to Henrik Skupin (:whimboo) from comment #124)
> Could this somehow be related to bug 837835?
It's possible, but we found the root cause for most of that spike in empty dumps and it was fixed.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 143•12 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #126)
> I hacked up some code to dump about:memory from a Mochitest:
> http://pastebin.mozilla.org/2432760
>
> It's terrible, but it seems to work.
Ted, who's in the best position to add this to the tests?
Flags: needinfo?(ted)
Comment 144•12 years ago
|
||
I was hoping jesup would, but he seems to be busy with other things. I've been in SF this whole week so I don't have my full build environment handy, and I'm travelling tomorrow, so I won't have time for this until Monday at the earliest.
Flags: needinfo?(ted)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 150•12 years ago
|
||
This crash can be seen constantly on try for my upcoming datachannel tests on bug 796894. So it might block its landing.
Blocks: 796894
Status: NEW → ASSIGNED
Updated•12 years ago
|
Whiteboard: [WebRTC][blocking-webrtc+][leave-open] → [WebRTC][blocking-webrtc+][leave-open][qa-automation-blocked]
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 153•12 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #144)
> I was hoping jesup would, but he seems to be busy with other things. I've
> been in SF this whole week so I don't have my full build environment handy,
> and I'm travelling tomorrow, so I won't have time for this until Monday at
> the earliest.
I can handle it
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Updated•12 years ago
|
Whiteboard: [WebRTC][blocking-webrtc+][leave-open][qa-automation-blocked] → [WebRTC][blocking-webrtc+][leave-open][qa-automation-blocked][webrtc-uplift]
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 167•12 years ago
|
||
We're not getting any more useful info out of the MSG logging, and it's causing problem with M-1 log sizes (bug 876545)
Comment 168•12 years ago
|
||
Comment on attachment 754928 [details] [diff] [review]
remove mediastreamgraph:4 logging
r=me if you need it. ;-)
Attachment #754928 -
Flags: review+
Comment 169•12 years ago
|
||
Comment hidden (Legacy TBPL/Treeherder Robot) |
Reporter | ||
Comment 171•12 years ago
|
||
Assignee | ||
Comment 172•12 years ago
|
||
Sorry, I've been a bit out of it with the Taiwan work week and since then, FirefoxOS stuff.
(In reply to Henrik Skupin (:whimboo) from comment #150)
> This crash can be seen constantly on try for my upcoming datachannel tests
> on bug 796894. So it might block its landing.
Can you reproduce that crash locally? If you can, that could really really help!
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 177•12 years ago
|
||
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #172)
> Can you reproduce that crash locally? If you can, that could really really
> help!
I cannot fully remember if I hit it locally but it was constantly failing on try. I can try if I can get it reproduced locally. Once I have it I can provide a better stack trace via gdb.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 189•11 years ago
|
||
Sorry, I finally got around to hooking up my about:memory dumping code to these mochitests, I pushed a try run:
https://tbpl.mozilla.org/?tree=Try&rev=25f0a25a7a29
Comment 190•11 years ago
|
||
Someone helpfully retriggered 30 more Windows 7 mochitest-3 jobs on my Try push, and none of them were orange. I triggered 10 more, we'll see if anything happens. I am theorizing that perhaps opening and closing about:memory in a tab for every test changes our GC/CC behavior so as to make an OOM not happen. If I don't see any orange on these runs I'll fiddle the patch tomorrow to only open one about:memory tab.
Comment 191•11 years ago
|
||
Should be disabled now.
Whiteboard: [WebRTC][blocking-webrtc+][leave-open][qa-automation-blocked][webrtc-uplift] → [WebRTC][blocking-webrtc-][leave-open][qa-automation-blocked][webrtc-uplift]
Comment 192•11 years ago
|
||
(In reply to Jason Smith [:jsmith] from comment #191)
> Should be disabled now.
Meant to say - disabled per https://bugzilla.mozilla.org/show_bug.cgi?id=866514#c29.
Comment 193•11 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #190)
> anything happens. I am theorizing that perhaps opening and closing
> about:memory in a tab for every test changes our GC/CC behavior so as to
> make an OOM not happen. If I don't see any orange on these runs I'll fiddle
> the patch tomorrow to only open one about:memory tab.
That's most likely the case. But instead of opening and closing the about:memory tab I wonder if we could directly call any API method. Nicholas, what is getting executed when you open about:memory?
Flags: needinfo?(n.nethercote)
Comment 194•11 years ago
|
||
> Nicholas, what is getting executed when you open about:memory?
toolkit/components/aboutmemory/contents/aboutMemory.js.
Flags: needinfo?(n.nethercote)
Comment 195•11 years ago
|
||
I did look at that, but I don't think it's straightforward to use that from a Mochitest. (The use case here is a little weird.)
Comment 196•11 years ago
|
||
It's interesting how few hits this has gotten since mid-last-week (when we had about 10 in a day)...
Comment 197•11 years ago
|
||
The lack of failures on retriggers with about:memory might be that the intermittent has become rare (the only one since 5/30 was on Birch)... So I'd suggest retriggering some win7 opt/debug builds from a random inbound push to see if you see it there - if you don't, then about:memory isn't hiding the bug.
Makes me concerned what caused it to go away might just be luck
Comment 198•11 years ago
|
||
After chatting with jesup I realized that we did make a large change in our test infra--we switched all the Windows test slaves to the new IX machines. You'll note that there are no failures on IX machines (comment 188 appears to be a mis-star).
Comment 199•11 years ago
|
||
I verified that Beta is still on Talos-* slaves, but the number of pushes there is low enough we may not see hits from a moderate/low freq intermittent.
It certainly does seem tied to the hardware change. Ted and I speculated it might be garbage building up and (if the odds are right and enough other stuff is running on the slave, perhaps) it runs out of memory. The new hardware apparently has more ram (and timings will be different).
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 201•11 years ago
|
||
That hit on beta with bug 866514 shows it wasn't caused by that patch. We've relanded it.
Comment 202•11 years ago
|
||
We should consider removing this from tracking given the latest info
Updated•11 years ago
|
Comment 203•11 years ago
|
||
> I did look at that, but I don't think it's straightforward to use that from
> a Mochitest. (The use case here is a little weird.)
If you can explain exactly what you need I might be able to help further.
Comment 204•11 years ago
|
||
It's not terribly important now, but I was just trying to get a dump of about:memory into the Mochitest logs to try to get some diagnostics on memory usage during the tests.
Updated•11 years ago
|
Comment 205•11 years ago
|
||
I think we ought to be able to close this now.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → WORKSFORME
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
You need to log in
before you can comment on or make changes to this bug.
Description
•