870002 - Intermittent test_peerConnection_basicAudioVideo.html,test_peerConnection_basicAudioVideoCombined.html,test_peerConnection_throwInCallbacks.html | Exited with code -2147483645 during test run | application crashed [Unknown top frame]

Reporter

Description

•

12 years ago

Maybe related to this push? https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=c9737a4136cf https://tbpl.mozilla.org/php/getParsedLog.php?id=22727216&tree=Mozilla-Inbound Rev3 WINNT 6.1 mozilla-inbound opt test mochitest-3 on 2013-05-08 05:04:13 PDT for push 5971dba36391 slave: talos-r3-w7-107 05:21:48 INFO - 18171 INFO TEST-INFO | /tests/dom/media/tests/mochitest/test_peerConnection_basicAudioVideo.html | Got media stream: audio (local) 05:21:48 INFO - 18172 INFO TEST-INFO | /tests/dom/media/tests/mochitest/test_peerConnection_basicAudioVideo.html | Call getUserMedia for {"video":true,"fake":true} 05:21:48 INFO - 0[ae82080]: [CCAPP Task|def] ccapi.c:1161: SIPCC-CC_API: 1/4, cc_int_feature2: UI -> GSM: ADDSTREAM 05:21:48 INFO - 0[ae821d0]: [GSM Task|def] dcsm.c:532: SIPCC-DCSM: dcsm_process_event: DCSM 23 :(DCSM_READY:ADDSTREAM ) 05:21:48 INFO - 0[ae821d0]: [GSM Task|fsm_sm] sm.c:46: SIPCC-FSM: sm_process_event: DEF 4 : 6C281E04x: sm entry: (IDLE:ADDSTREAM) 05:21:48 INFO - 0[ae821d0]: [GSM Task|fsm_sm] fsmdef.c:3535: SIPCC-FSM: fsmdef_ev_addstream: Entered. 05:21:48 INFO - 0[ae821d0]: [GSM Task|def] sm.c:65: SIPCC-GSM: 1/4, sm_process_event: DEF :(IDLE:ADDSTREAM ) 05:23:05 WARNING - TEST-UNEXPECTED-FAIL | /tests/dom/media/tests/mochitest/test_peerConnection_basicAudioVideo.html | Exited with code -2147483645 during test run 05:23:07 INFO - INFO | automation.py | Application ran for: 0:15:26.393000 05:23:07 INFO - INFO | zombiecheck | Reading PID log: c:\users\cltbld\appdata\local\temp\tmpsmm_wbpidlog 05:23:07 INFO - ==> process 1540 launched child process 956 05:23:07 INFO - ==> process 1540 launched child process 3564 05:23:07 INFO - ==> process 1540 launched child process 404 05:23:07 INFO - ==> process 1540 launched child process 976 05:23:07 INFO - INFO | zombiecheck | Checking for orphan process with PID: 956 05:23:07 INFO - INFO | zombiecheck | Checking for orphan process with PID: 3564 05:23:07 INFO - INFO | zombiecheck | Checking for orphan process with PID: 404 05:23:07 INFO - INFO | zombiecheck | Checking for orphan process with PID: 976 05:23:07 INFO - mozcrash INFO | Downloading symbols from: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-win32/1368011389/firefox-23.0a1.en-US.win32.crashreporter-symbols.zip 05:23:08 INFO - Downloading symbols from: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-win32/1368011389/firefox-23.0a1.en-US.win32.crashreporter-symbols.zip 05:23:21 WARNING - PROCESS-CRASH | /tests/dom/media/tests/mochitest/test_peerConnection_basicAudioVideo.html | application crashed [Unknown top frame]

Ryan VanderMeulen [:RyanVM]

Reporter

Comment 1

•

12 years ago

https://tbpl.mozilla.org/php/getParsedLog.php?id=22737338&tree=Mozilla-Inbound

Ryan VanderMeulen [:RyanVM]

Reporter

Updated

•

12 years ago

Summary: Intermittent test_peerConnection_basicAudioVideo.html | Exited with code -2147483645 during test run | application crashed [Unknown top frame] → Intermittent test_peerConnection_basicAudioVideo.html,test_peerConnection_basicAudioVideoCombined.html | Exited with code -2147483645 during test run | application crashed [Unknown top frame]

Comment hidden (Legacy TBPL/Treeherder Robot)

Ryan VanderMeulen [:RyanVM]

Reporter

Comment 9

•

12 years ago

https://tbpl.mozilla.org/php/getParsedLog.php?id=22757204&tree=Mozilla-Inbound

Summary: Intermittent test_peerConnection_basicAudioVideo.html,test_peerConnection_basicAudioVideoCombined.html | Exited with code -2147483645 during test run | application crashed [Unknown top frame] → Intermittent test_peerConnection_basicAudioVideo.html,test_peerConnection_basicAudioVideoCombined.html,test_peerConnection_throwInCallbacks.html | Exited with code -2147483645 during test run | application crashed [Unknown top frame]

Randell Jesup [:jesup] (needinfo me)

Comment 10

•

12 years ago

Attached patch move data-processing debugs in MSG to level 5 to allow granular logging (obsolete) — Details — Splinter Review

Randell Jesup [:jesup] (needinfo me)

Comment 11

•

12 years ago

Attached patch enable MediaStreamGraph logging to try to hunt down bug 870002 — Details — Splinter Review

Randell Jesup [:jesup] (needinfo me)

Comment 12

•

12 years ago

Attached patch move data-processing debugs in MSG to level 5 to allow granular logging — Details — Splinter Review

Randell Jesup [:jesup] (needinfo me)

Updated

•

12 years ago

Attachment #747282 - Attachment is obsolete: true

Randell Jesup [:jesup] (needinfo me)

Comment 13

•

12 years ago

Comment on attachment 747287 [details] [diff] [review] move data-processing debugs in MSG to level 5 to allow granular logging This is just a patch to move a bunch of high-volume debugs in MSG to level 5 from 4 (PR_LOG_DEBUG), so we can turn on debugging of things like adding tracks in automation without generating 20MB+ log files. The dom/media/tests/mochitests logs drop from 22MB to 800K with mediastreamgraph:4 logging (500K without any MSG logging) For this sort of patch (just changing debug levels) I'll take whomever can review first. We could locally define in MediaStreamGraph.h a LOG_MSG_DETAILS or some such and use that instead of PR_LOG_DEBUG+1; I don't really care either way.

Attachment #747287 - Flags: review?(tterribe)

Attachment #747287 - Flags: review?(roc)

Attachment #747287 - Flags: review?(paul)

Attachment #747287 - Flags: review?(ehsan)

Attachment #747287 - Flags: review?(cpearce)

Attachment #747287 - Flags: review?(adam)

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Updated

•

12 years ago

Attachment #747287 - Flags: review?(roc) → review+

Randell Jesup [:jesup] (needinfo me)

Comment 14

•

12 years ago

Comment on attachment 747283 [details] [diff] [review] enable MediaStreamGraph logging to try to hunt down bug 870002 Once the other bug here to change debugs in MSG is approved, this will turn on more "what's going on" debugging for MSG without blowing up the logs (a few hundred K more roughly). I'll take whomever feels they can r+ this. This is intended to be backed out as soon as we've figured out bug 870002 (or decided this debug change doesn't help find it).

Attachment #747283 - Flags: review?(ted)

Attachment #747283 - Flags: review?(ryanvm)

Attachment #747283 - Flags: review?(philringnalda)

Attachment #747283 - Flags: review?(emorley)

Randell Jesup [:jesup] (needinfo me)

Updated

•

12 years ago

Whiteboard: [leave-open]

Phil Ringnalda (:philor)

Updated

•

12 years ago

Attachment #747283 - Flags: review?(ted)

Attachment #747283 - Flags: review?(ryanvm)

Attachment #747283 - Flags: review?(philringnalda)

Attachment #747283 - Flags: review?(emorley)

Attachment #747283 - Flags: review+

Randell Jesup [:jesup] (needinfo me)

Updated

•

12 years ago

Attachment #747287 - Flags: review?(tterribe)

Attachment #747287 - Flags: review?(paul)

Attachment #747287 - Flags: review?(ehsan)

Attachment #747287 - Flags: review?(cpearce)

Attachment #747287 - Flags: review?(adam)

Randell Jesup [:jesup] (needinfo me)

Comment 15

•

12 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/f011e4187ec5 https://hg.mozilla.org/integration/mozilla-inbound/rev/38282df9d4f0

Comment hidden (Legacy TBPL/Treeherder Robot)

Randell Jesup [:jesup] (needinfo me)

Comment 20

•

12 years ago

Ok, we got a hit on the retriggers! So we see this sequence from MSG right before the crash: 09:01:58 INFO - 3288[f8654c8]: Adding media stream 1301ea30 to the graph 09:01:58 INFO - 3288[f8654c8]: Adding media stream 12e9d1e0 to the graph 09:01:58 INFO - 3288[f8654c8]: Adding MediaInputPort 1620d580 (from 12e9d1e0 to 1301ea30) to the graph 09:01:58 INFO - 3288[f8654c8]: SourceMediaStream 12e9d1e0 creating track 1, rate 1000000, start 0, initial end 33333 Normally in all the other instances above in the log, it has this line following it: 09:01:58 INFO - 3288[f8654c8]: TrackUnionStream 1301e0c8 adding track 1 for input stream c401870 track 1, start ticks 0 Roc: any ideas? Debugs/asserts to add? but that's missing here.

Flags: needinfo?(roc)

Ryan VanderMeulen [:RyanVM]

Reporter

Comment 21

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/f011e4187ec5 https://hg.mozilla.org/mozilla-central/rev/38282df9d4f0

Comment hidden (Legacy TBPL/Treeherder Robot)

Randell Jesup [:jesup] (needinfo me)

Comment 23

•

12 years ago

Hmmmm. The hit from m-c has the missing TrackUnion line. :-( Back to "any debugs/asserts we can add?" roc? abr/ehugg? Anything stick out to you aboud where it's failing?

Jason Smith [:jsmith]

Updated

•

12 years ago

Whiteboard: [leave-open] → [WebRTC][blocking-webrtc?][leave-open]

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 24

•

12 years ago

Can we get the minidumps being produced by these crashes? That might help...

Flags: needinfo?(roc)

Comment hidden (Legacy TBPL/Treeherder Robot)

Randell Jesup [:jesup] (needinfo me)

Comment 27

•

12 years ago

The minidumps are 0 bytes.... :-( ted jesup: either that or figure out how to get windows to generate a minidump for you, and disable breakpad ted since windows does it out-of-process ted jesup: http://msdn.microsoft.com/en-us/library/windows/desktop/bb787181%28v=vs.85%29.aspx ted maybe take a slave out of service, configure that, run the test repeatedly till it fails? ted running under a debugger might change the failure mode jesup ted: do the slaves run one mochitest at a time, or multiple? ted multiple catlee-buildduty in sequence ted we split the run into 5 chunks, each test run runs all the tests in that chunk jesup So we may need to emulate that to produce the timings needed to force the failure ted you can just take a build that has displayed the failure and run the same chunk ted file a bug to get a test slave set aside

Comment hidden (Legacy TBPL/Treeherder Robot)

Randell Jesup [:jesup] (needinfo me)

Comment 35

•

12 years ago

Added dependencies to possible sources

Depends on: 863224, 866514, 868406

Comment hidden (Legacy TBPL/Treeherder Robot)

Maire Reavy [:mreavy]

Updated

•

12 years ago

tracking-firefox22: --- → ?

tracking-firefox23: --- → ?

Maire Reavy [:mreavy]

Updated

•

12 years ago

status-firefox22: --- → affected

status-firefox23: --- → affected

Jason Smith [:jsmith]

Comment 43

•

12 years ago

I'm going to guess the tracking nom here means there's agreement this is a blocker as a crash regression.

Whiteboard: [WebRTC][blocking-webrtc?][leave-open] → [WebRTC][blocking-webrtc+][leave-open]

Comment hidden (Legacy TBPL/Treeherder Robot)

Maire Reavy [:mreavy]

Updated

•

12 years ago

Assignee: nobody → roc

Comment hidden (Legacy TBPL/Treeherder Robot)

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 54

•

12 years ago

I ran 200 iterations of the dom/media mochitests on my Windows laptop, with no failures.

Comment hidden (Legacy TBPL/Treeherder Robot)

Ryan VanderMeulen [:RyanVM]

Reporter

Comment 58

•

12 years ago

(In reply to TinderboxPushlog Robot from comment #57) > RyanVM > https://tbpl.mozilla.org/php/getParsedLog.php?id=22925320&tree=Mozilla-Beta > Rev3 WINNT 6.1 mozilla-beta pgo test mochitest-3 on 2013-05-13 19:29:15 > slave: talos-r3-w7-051 > > 19:34:15 WARNING - TEST-UNEXPECTED-FAIL | > /tests/dom/media/tests/mochitest/test_peerConnection_basicAudioVideoCombined. > html | Exited with code -2147483645 during test run > 19:34:33 WARNING - PROCESS-CRASH | > /tests/dom/media/tests/mochitest/test_peerConnection_basicAudioVideoCombined. > html | application crashed [Unknown top frame] > 19:34:39 ERROR - Return code: 1 Looks like bug 866514 is indeed at fault. "Yay"

Comment hidden (Legacy TBPL/Treeherder Robot)

Jason Smith [:jsmith]

Comment 61

•

12 years ago

Good. Flagging the regressing bug then.

Blocks: 866514

No longer depends on: 863224, 866514, 868406

Keywords: regression

Comment hidden (Legacy TBPL/Treeherder Robot)

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 69

•

12 years ago

Ted, is there anything we can do to diagnose the empty minidump? Maybe more diagnostics in the code that creates the minidump?

Flags: needinfo?(ted)

Comment hidden (Legacy TBPL/Treeherder Robot)

(not currently active) Ted Mielczarek

Comment 71

•

12 years ago

We simply call into a Microsoft library function: MinidumpWriteDump. The most common cause for an empty dump is running out of virtual memory, whether due to actual exhaustion or fragmentation. If you'd like to print out memory stats right after we write the minidump (or fail to), we already gather some to send with the crash report, you could put some logging statements here: http://mxr.mozilla.org/mozilla-central/source/toolkit/crashreporter/nsExceptionHandler.cpp#551

Flags: needinfo?(ted)

Comment hidden (Legacy TBPL/Treeherder Robot)

Alex Keybl [:akeybl]

Updated

•

12 years ago

tracking-firefox22: ? → +

tracking-firefox23: ? → +

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 73

•

12 years ago

I filed bug 872786 on gathering more information when minidump collection fails.

Comment hidden (Legacy TBPL/Treeherder Robot)

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 75

•

12 years ago

In https://tbpl.mozilla.org/?tree=Try&rev=da5eabb5aafe I have a try push with my patch for bug 872786, to try to gather more data when minidump creation fails. I'll try retriggering this test to see if we can collect some useful data there.

Comment hidden (Legacy TBPL/Treeherder Robot)

Randell Jesup [:jesup] (needinfo me)

Comment 77

•

12 years ago

I retriggered roc's try a bunch more times, and got a hit: https://tbpl.mozilla.org/php/getParsedLog.php?id=23021195&tree=Try&full=1 00:43:17 INFO - out of memory: 0x0000000000070800 bytes requested 00:43:18 INFO - Minidump creation for thread 328 failed with GetLastError() -2147024865! 00:43:18 INFO - * EXCEPTION_RECORD Code=80000003 Flags=0 Address=7329113f Information[0]=0 Information[1]=-2067537872 Information[2]=3 00:43:18 INFO - * CONTEXT Eax=0 Ebx=0 Ecx=72933896 Edx=3 Esi=728e1ec6 Edi=7293379c Ebp=11e2f6a0 Esp=11e2f698 Eip=7329113f EFlags=202 SegCs=1b SegSs=23 SegDs=23 SegEs=23 SegFs=3b SegGs=0 00:43:18 INFO - * Memory at 732910bf:

Comment hidden (Legacy TBPL/Treeherder Robot)

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 79

•

12 years ago

There are two failures. Both crashed at the same address with the same OOM message. http://social.msdn.microsoft.com/Forums/en-US/vcgeneral/thread/2f458521-4315-4295-9c85-336d693d55cc describes the same error when calling MiniDumpWriteDump, but apart from mentioning the memory allocation issue, does not help. I wonder what generates that "out of memory: 0x0000000000070800 bytes requested" message.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 80

•

12 years ago

Oh, that message comes from mozalloc_handle_oom.

Ryan VanderMeulen [:RyanVM]

Reporter

Updated

•

12 years ago

Comment 82

•

12 years ago

Running the tests locally, 0x70800 is from allocating a PlanarYCbCrImage --- 640*480*1.5 bytes per pixel. The allocation is made infallibly, which is probably a mistake. This suggests maybe we're leaking temporarily, or something.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 83

•

12 years ago

I meed to sleep now, but I want to look into the patch in bug 866514 and see if the media streams are being cleaned up properly. If we were temporarily leaking MediaStreams and their cached video frames, but cleaning them up on shutdown, that might cause this,

Comment hidden (Legacy TBPL/Treeherder Robot)

(not currently active) Ted Mielczarek

Comment 86

•

12 years ago

Right, I suspect the actual OOM point of failure isn't very interesting here, it's just whatever sucker tries to allocate memory at that point. The issue is "what's actually eating up all our memory".

Randell Jesup [:jesup] (needinfo me)

Comment 87

•

12 years ago

"thread 328" is intriguing... Why so many? Something not getting cleaned up?

Comment hidden (Legacy TBPL/Treeherder Robot)

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 90

•

12 years ago

Interestingly, bug 866514 (or some other change around there) has made us clean up MediaStreams *earlier* when I just run the dom/media tests. Which doesn't help explain this bug at all.

Comment hidden (Legacy TBPL/Treeherder Robot)

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 93

•

12 years ago

Bug 872996 looks like this bug. However, in bug 872996 I would not expect the code changed in bug 866514 to have run yet. Very mysterious :-(.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 94

•

12 years ago

We could try backing out 866514 and relanding it one little piece at a time. I don't have any better ideas.

Comment hidden (Legacy TBPL/Treeherder Robot)

Randell Jesup [:jesup] (needinfo me)

Comment 96

•

12 years ago

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #94) > We could try backing out 866514 and relanding it one little piece at a time. > I don't have any better ideas. We could try the Microsoft AppVerifier - Ethan? How easy would it be to try running it on a mochitest set? Ted: I assume the out-of-memory could be some type of heap corruption? Do we have in-tbpl ASAN mochitest runs for mac/linux at all? Could we retrigger them a bunch of times, or if we don't, could we do an ASAN Try build and retrigger? Try's are known to hit it (if retriggered enough) so we can submit Trys with different pieces landed and then use that to bisect the patch. Weekend is coming and infra is more lightly loaded :-)

Flags: needinfo?(ted)

Flags: needinfo?(ethanhugg)

(not currently active) Ted Mielczarek

Comment 97

•

12 years ago

(In reply to Randell Jesup [:jesup] from comment #96) > Ted: I assume the out-of-memory could be some type of heap corruption? > Do we have in-tbpl ASAN mochitest runs for mac/linux at all? Could we > retrigger them a bunch of times, or if we don't, could we do an ASAN Try > build and retrigger? We do not have them on TBPL, but you can run them on Try. I don't know how well it works on Mochitests.

Flags: needinfo?(ted)

Comment hidden (Legacy TBPL/Treeherder Robot)

Ethan Hugg [:ehugg]

Comment 99

•

12 years ago

(In reply to Randell Jesup [:jesup] from comment #96) > We could try the Microsoft AppVerifier - Ethan? How easy would it be to try > running it on a mochitest set? > I will try this on AppVerif today. I haven't run the mochitests on AppVerif yet, only the unittests and the by-hand demos.

Flags: needinfo?(ethanhugg)

Comment hidden (Legacy TBPL/Treeherder Robot)

Ethan Hugg [:ehugg]

Comment 102

•

12 years ago

I did not find the smoking gun I was looking for but I thought I'd document some AppVerif results here. These will happen with any page that uses a PeerConnection with default AppVerif checks. spl_init.c:106 LOCK: EnterCriticalSection() called on Unititialized CS. The CS is actually initialized by hand statically three lines earlier in the file. AppVerif complains because InitializeCriticalSection() was not called. rw_lock_win.cc:55 SRWLOCK: AcquireLockShared() fails on PC shutdown, perhaps lock already destroyed. Get this several times when navigating away from a page that uses a peer connection Stack: vrfcore.dll!_VerifierStopMessageEx() Unknown vfbasics.dll!_AVrfpVerifySRWLockAcquire@12() Unknown vfbasics.dll!_AVrfpRtlAcquireSRWLockShared@4() Unknown > xul.dll!webrtc::RWLockWin::AcquireLockShared() Line 56 C++ xul.dll!webrtc::voe::ChannelManagerBase::GetItem(int itemId) Line 158 C++ xul.dll!webrtc::voe::ChannelManager::GetChannel(const int channelId) Line 77 C++ xul.dll!webrtc::voe::ScopedChannel::ScopedChannel(webrtc::voe::ChannelManager & chManager, int channelId) Line 111 C++ xul.dll!webrtc::VoEBaseImpl::StopPlayout() Line 1471 C++ xul.dll!webrtc::VoEBaseImpl::StopPlayout(int channel) Line 1113 C++ xul.dll!mozilla::WebrtcAudioConduit::~WebrtcAudioConduit() Line 102 C++ xul.dll!mozilla::WebrtcAudioConduit::`scalar deleting destructor'(unsigned int) C++ xul.dll!mozilla::MediaSessionConduit::Release() Line 139 C++ xul.dll!mozilla::RefPtr<mozilla::MediaSessionConduit>::unref(mozilla::MediaSessionConduit * t) Line 172 C++ xul.dll!mozilla::RefPtr<mozilla::MediaSessionConduit>::~RefPtr<mozilla::MediaSessionConduit>() Line 121 C++ xul.dll!mozilla::ConduitDeleteEvent::~ConduitDeleteEvent() C++ xul.dll!mozilla::ConduitDeleteEvent::`scalar deleting destructor'(unsigned int) C++ xul.dll!nsRunnable::Release() Line 31 C++ xul.dll!nsCOMPtr<nsIRunnable>::~nsCOMPtr<nsIRunnable>() Line 523 C++ xul.dll!nsThread::ProcessNextEvent(bool mayWait, bool * result) Line 635 C++ xul.dll!NS_ProcessNextEvent(nsIThread * thread, bool mayWait) Line 238 C++ xul.dll!mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate * aDelegate) Line 82 C++ xul.dll!MessageLoop::RunInternal() Line 220 C++ xul.dll!MessageLoop::RunHandler() Line 213 C++ xul.dll!MessageLoop::Run() Line 187 C++ xul.dll!nsBaseAppShell::Run() Line 165 C++ xul.dll!nsAppShell::Run() Line 113 C++ xul.dll!nsAppStartup::Run() Line 269 C++ xul.dll!XREMain::XRE_mainRun() Line 3877 C++ xul.dll!XREMain::XRE_main(int argc, char * * argv, const nsXREAppData * aAppData) Line 3944 C++ xul.dll!XRE_main(int argc, char * * argv, const nsXREAppData * aAppData, unsigned int aFlags) Line 4145 C++ firefox.exe!do_main(int argc, char * * argv, nsIFile * xreDirectory) Line 272 C++ firefox.exe!NS_internal_main(int argc, char * * argv) Line 632 C++ firefox.exe!wmain(int argc, wchar_t * * argv) Line 105 C++ firefox.exe!__tmainCRTStartup() Line 533 C firefox.exe!wmainCRTStartup() Line 377 C kernel32.dll!@BaseThreadInitThunk@12() Unknown ntdll.dll!___RtlUserThreadStart@8() Unknown ntdll.dll!__RtlUserThreadStart@8() Unknown If I turn of lock and srwlock checking I don't get errors. AppVerif has caught heap errors like use-after-free in Firefox for me before.

Comment hidden (Legacy TBPL/Treeherder Robot)

Randell Jesup [:jesup] (needinfo me)

Comment 111

•

12 years ago

Suggestions from bsmedberg: [12:34] bsmedberg jesup: well can you dump about:memory to the testing log at the beginning of this/these tests? [12:34] jesup I'm pretty sure we can [12:36] bsmedberg What we really want is external crash reporting, but that's not a simple project [12:38] jesup yeah. We need to find some way to solve this in the next week or so, which rules that out [12:38] bsmedberg jesup: you could also try hacking the tests so that it disables the crash reporter and launches the process using procdump [12:38] bsmedberg The test harness has changed enough that I don't know where we do that stuff nowadays. [12:39] jesup ok; I don't know what's involved with that, but I can probably ping ted to help with that ted: can you help with his suggestions? (either/both) Roc is at a work-week an taiwan, so will only be iffily available; I'll help as much as I can put together try runs (probably based on roc's try that found the OOM issue) and retrigger/etc, and analyze anything we can get from them.

Flags: needinfo?(ted)

Comment hidden (Legacy TBPL/Treeherder Robot)

(not currently active) Ted Mielczarek

Comment 113

•

12 years ago

I'm in SFO this week, so timezones are not fantastic and I don't have my full complement of machines, but I'll see if I can figure something out here. I don't think external crash reporting is really going to help us, we've determined that this is just "crashing on OOM". What we really need to find out is *what* is eating the memory.

Flags: needinfo?(ted)

Comment hidden (Legacy TBPL/Treeherder Robot)

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 124

•

12 years ago

Could this somehow be related to bug 837835?

Comment hidden (Legacy TBPL/Treeherder Robot)

(not currently active) Ted Mielczarek

Comment 126

•

12 years ago

I hacked up some code to dump about:memory from a Mochitest: http://pastebin.mozilla.org/2432760 It's terrible, but it seems to work. (In reply to Henrik Skupin (:whimboo) from comment #124) > Could this somehow be related to bug 837835? It's possible, but we found the root cause for most of that spike in empty dumps and it was fixed.

Comment hidden (Legacy TBPL/Treeherder Robot)

Alex Keybl [:akeybl]

Comment 143

•

12 years ago

(In reply to Ted Mielczarek [:ted.mielczarek] from comment #126) > I hacked up some code to dump about:memory from a Mochitest: > http://pastebin.mozilla.org/2432760 > > It's terrible, but it seems to work. Ted, who's in the best position to add this to the tests?

Flags: needinfo?(ted)

(not currently active) Ted Mielczarek

Comment 144

•

12 years ago

I was hoping jesup would, but he seems to be busy with other things. I've been in SF this whole week so I don't have my full build environment handy, and I'm travelling tomorrow, so I won't have time for this until Monday at the earliest.

Flags: needinfo?(ted)

Comment hidden (Legacy TBPL/Treeherder Robot)

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 150

•

12 years ago

This crash can be seen constantly on try for my upcoming datachannel tests on bug 796894. So it might block its landing.

Blocks: 796894

Status: NEW → ASSIGNED

Henrik Skupin [:whimboo][⌚️UTC+2]

Updated

•

12 years ago

Whiteboard: [WebRTC][blocking-webrtc+][leave-open] → [WebRTC][blocking-webrtc+][leave-open][qa-automation-blocked]

Comment hidden (Legacy TBPL/Treeherder Robot)

Randell Jesup [:jesup] (needinfo me)

Comment 153

•

12 years ago

(In reply to Ted Mielczarek [:ted.mielczarek] from comment #144) > I was hoping jesup would, but he seems to be busy with other things. I've > been in SF this whole week so I don't have my full build environment handy, > and I'm travelling tomorrow, so I won't have time for this until Monday at > the earliest. I can handle it

Comment hidden (Legacy TBPL/Treeherder Robot)

Maire Reavy [:mreavy]

Updated

•

12 years ago

Whiteboard: [WebRTC][blocking-webrtc+][leave-open][qa-automation-blocked] → [WebRTC][blocking-webrtc+][leave-open][qa-automation-blocked][webrtc-uplift]

Comment hidden (Legacy TBPL/Treeherder Robot)

Randell Jesup [:jesup] (needinfo me)

Comment 167

•

12 years ago

Attached patch remove mediastreamgraph:4 logging — Details — Splinter Review

We're not getting any more useful info out of the MSG logging, and it's causing problem with M-1 log sizes (bug 876545)

(no longer active)

Comment 168

•

12 years ago

Comment on attachment 754928 [details] [diff] [review] remove mediastreamgraph:4 logging r=me if you need it. ;-)

Attachment #754928 - Flags: review+

Randell Jesup [:jesup] (needinfo me)

Comment 169

•

12 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/a6bc6c0bb3bd

Comment hidden (Legacy TBPL/Treeherder Robot)

Ryan VanderMeulen [:RyanVM]

Reporter

Comment 171

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/a6bc6c0bb3bd

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 172

•

12 years ago

Sorry, I've been a bit out of it with the Taiwan work week and since then, FirefoxOS stuff. (In reply to Henrik Skupin (:whimboo) from comment #150) > This crash can be seen constantly on try for my upcoming datachannel tests > on bug 796894. So it might block its landing. Can you reproduce that crash locally? If you can, that could really really help!

Comment hidden (Legacy TBPL/Treeherder Robot)

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 177

•

12 years ago

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #172) > Can you reproduce that crash locally? If you can, that could really really > help! I cannot fully remember if I hit it locally but it was constantly failing on try. I can try if I can get it reproduced locally. Once I have it I can provide a better stack trace via gdb.

Comment hidden (Legacy TBPL/Treeherder Robot)

(not currently active) Ted Mielczarek

Comment 189

•

11 years ago

Sorry, I finally got around to hooking up my about:memory dumping code to these mochitests, I pushed a try run: https://tbpl.mozilla.org/?tree=Try&rev=25f0a25a7a29

(not currently active) Ted Mielczarek

Comment 190

•

11 years ago

Someone helpfully retriggered 30 more Windows 7 mochitest-3 jobs on my Try push, and none of them were orange. I triggered 10 more, we'll see if anything happens. I am theorizing that perhaps opening and closing about:memory in a tab for every test changes our GC/CC behavior so as to make an OOM not happen. If I don't see any orange on these runs I'll fiddle the patch tomorrow to only open one about:memory tab.

Jason Smith [:jsmith]

Comment 191

•

11 years ago

Should be disabled now.

status-firefox22: affected → disabled

Whiteboard: [WebRTC][blocking-webrtc+][leave-open][qa-automation-blocked][webrtc-uplift] → [WebRTC][blocking-webrtc-][leave-open][qa-automation-blocked][webrtc-uplift]

Jason Smith [:jsmith]

Comment 192

•

11 years ago

(In reply to Jason Smith [:jsmith] from comment #191) > Should be disabled now. Meant to say - disabled per https://bugzilla.mozilla.org/show_bug.cgi?id=866514#c29.

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 193

•

11 years ago

(In reply to Ted Mielczarek [:ted.mielczarek] from comment #190) > anything happens. I am theorizing that perhaps opening and closing > about:memory in a tab for every test changes our GC/CC behavior so as to > make an OOM not happen. If I don't see any orange on these runs I'll fiddle > the patch tomorrow to only open one about:memory tab. That's most likely the case. But instead of opening and closing the about:memory tab I wonder if we could directly call any API method. Nicholas, what is getting executed when you open about:memory?

Flags: needinfo?(n.nethercote)

Nicholas Nethercote [inactive]

Comment 194

•

11 years ago

> Nicholas, what is getting executed when you open about:memory? toolkit/components/aboutmemory/contents/aboutMemory.js.

Flags: needinfo?(n.nethercote)

(not currently active) Ted Mielczarek

Comment 195

•

11 years ago

I did look at that, but I don't think it's straightforward to use that from a Mochitest. (The use case here is a little weird.)

status-firefox22: disabled → affected

Randell Jesup [:jesup] (needinfo me)

Comment 196

•

11 years ago

It's interesting how few hits this has gotten since mid-last-week (when we had about 10 in a day)...

Randell Jesup [:jesup] (needinfo me)

Comment 197

•

11 years ago

The lack of failures on retriggers with about:memory might be that the intermittent has become rare (the only one since 5/30 was on Birch)... So I'd suggest retriggering some win7 opt/debug builds from a random inbound push to see if you see it there - if you don't, then about:memory isn't hiding the bug. Makes me concerned what caused it to go away might just be luck

(not currently active) Ted Mielczarek

Comment 198

•

11 years ago

After chatting with jesup I realized that we did make a large change in our test infra--we switched all the Windows test slaves to the new IX machines. You'll note that there are no failures on IX machines (comment 188 appears to be a mis-star).

Randell Jesup [:jesup] (needinfo me)

Comment 199

•

11 years ago

I verified that Beta is still on Talos-* slaves, but the number of pushes there is low enough we may not see hits from a moderate/low freq intermittent. It certainly does seem tied to the hardware change. Ted and I speculated it might be garbage building up and (if the odds are right and enough other stuff is running on the slave, perhaps) it runs out of memory. The new hardware apparently has more ram (and timings will be different).

Comment hidden (Legacy TBPL/Treeherder Robot)

Randell Jesup [:jesup] (needinfo me)

Comment 201

•

11 years ago

That hit on beta with bug 866514 shows it wasn't caused by that patch. We've relanded it.

Randell Jesup [:jesup] (needinfo me)

Comment 202

•

11 years ago

We should consider removing this from tracking given the latest info

Randell Jesup [:jesup] (needinfo me)

Updated

•

11 years ago

tracking-firefox22: + → ?

tracking-firefox23: + → ?

Nicholas Nethercote [inactive]

Comment 203

•

11 years ago

> I did look at that, but I don't think it's straightforward to use that from > a Mochitest. (The use case here is a little weird.) If you can explain exactly what you need I might be able to help further.

(not currently active) Ted Mielczarek

Comment 204

•

11 years ago

It's not terribly important now, but I was just trying to get a dump of about:memory into the Mochitest logs to try to get some diagnostics on memory usage during the tests.

Alex Keybl [:akeybl]

Updated

•

11 years ago

tracking-firefox22: ? → -

tracking-firefox23: ? → -

move data-processing debugs in MSG to level 5 to allow granular logging 12 years ago Randell Jesup [:jesup] (needinfo me) 13.78 KB, patch		Details \| Diff \| Splinter Review
enable MediaStreamGraph logging to try to hunt down bug 870002 12 years ago Randell Jesup [:jesup] (needinfo me) 1.26 KB, patch	philor : review+	Details \| Diff \| Splinter Review
move data-processing debugs in MSG to level 5 to allow granular logging 12 years ago Randell Jesup [:jesup] (needinfo me) 17.65 KB, patch	roc : review+	Details \| Diff \| Splinter Review
remove mediastreamgraph:4 logging 12 years ago Randell Jesup [:jesup] (needinfo me) 1.19 KB, patch	ehsan.akhgari : review+	Details \| Diff \| Splinter Review