Closed Bug 1141107 Opened 10 years ago Closed 10 years ago

Only slaves broken by our inability to update talos hit tsvgx,tresize,tp5o_scroll | application crashed [@ nsSocketTransport::InitiateSocket()]

Categories

(Testing :: Talos, defect)

x86
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: cbook, Unassigned)

References

()

Details

(Keywords: crash, intermittent-failure, Whiteboard: [leave open])

Ubuntu HW 12.04 fx-team pgo talos svgr https://treeherder.mozilla.org/logviewer.html#?job_id=2194969&repo=fx-team 07:51:44 INFO - PROCESS-CRASH | tsvgx | application crashed [@ nsSocketTransport::InitiateSocket()] 07:51:44 INFO - Crash dump filename: /tmp/tmpjNnTOS/profile/minidumps/25db96e2-9a6a-5c8d-7f85514e-2fe7f7ee.dmp 07:51:44 INFO - Operating system: Linux 07:51:44 INFO - 0.0.0 Linux 3.2.0-76-generic-pae #111-Ubuntu SMP Tue Jan 13 22:34:29 UTC 2015 i686 07:51:44 INFO - CPU: x86 07:51:44 INFO - GenuineIntel family 6 model 30 stepping 5 07:51:44 INFO - 8 CPUs 07:51:44 INFO - Crash reason: SIGSEGV 07:51:44 INFO - Crash address: 0x0 07:51:44 INFO - Thread 5 (crashed) 07:51:44 INFO - 0 libxul.so!nsSocketTransport::InitiateSocket() [nsSocketTransport2.cpp:a9e7d74b9f5b : 1226 + 0x0] 07:51:44 INFO - eip = 0xb35eebf1 esp = 0xae4fddb0 ebp = 0xae4fdf58 ebx = 0xb6f06140 07:51:44 INFO - esi = 0xb7645d9c edi = 0xb559fb5c eax = 0x00000172 ecx = 0x00000000 07:51:44 INFO - edx = 0xae4fd770 efl = 0x00210286 07:51:44 INFO - Found by: given as instruction pointer in context 07:51:44 INFO - 1 libxul.so!nsSocketTransport::OnSocketEvent(unsigned int, nsresult, nsISupports*) [nsSocketTransport2.cpp:a9e7d74b9f5b : 1765 + 0x7] 07:51:44 INFO - eip = 0xb35eeeac esp = 0xae4fdf60 ebp = 0xae4fdfa8 ebx = 0xb6f06140 07:51:44 INFO - esi = 0x9dace900 edi = 0xb3610e40 07:51:44 INFO - Found by: call frame info 07:51:44 INFO - 2 libxul.so!nsSocketEvent::Run() [nsSocketTransport2.cpp:a9e7d74b9f5b : 79 + 0x1f] 07:51:44 INFO - eip = 0xb35ef149 esp = 0xae4fdfb0 ebp = 0xae4fdfc8 ebx = 0xb6f06140 07:51:44 INFO - esi = 0xb722c6c0 edi = 0xb7237a88 07:51:44 INFO - Found by: call frame info 07:51:44 INFO - 3 libxul.so!nsThread::ProcessNextEvent(bool, bool*) [nsThread.cpp:a9e7d74b9f5b : 855 + 0x1] 07:51:44 INFO - eip = 0xb323ccc8 esp = 0xae4fdfd0 ebp = 0xae4fe058 ebx = 0xb6f06140 07:51:44 INFO - esi = 0xb722c6c0 edi = 0xb7237a88 07:51:44 INFO - Found by: call frame info 07:51:44 INFO - 4 libxul.so!NS_ProcessNextEvent(nsIThread*, bool) [nsThreadUtils.cpp:a9e7d74b9f5b : 265 + 0x12] 07:51:44 INFO - eip = 0xb3245a25 esp = 0xae4fe060 ebp = 0xae4fe098 ebx = 0xb6f06140 07:51:44 INFO - esi = 0xae4fe08c edi = 0xb7237a80 07:51:44 INFO - Found by: call frame info 07:51:44 INFO - 5 libxul.so!nsSocketTransportService::Run() [nsSocketTransportService2.cpp:a9e7d74b9f5b : 768 + 0xf] 07:51:44 INFO - eip = 0xb35feb0c esp = 0xae4fe0a0 ebp = 0xae4fe108 ebx = 0xb6f06140
OS: Mac OS X → Linux
The actual failure is in reaching self-repair.mozilla.org: 07:51:24 INFO - mozversion INFO | application_repository: https://hg.mozilla.org/integration/fx-team 07:51:24 INFO - mozversion INFO | application_version: 39.0a1 07:51:24 INFO - mozversion INFO | platform_buildid: 20150309053012 07:51:24 INFO - mozversion INFO | platform_changeset: a9e7d74b9f5b 07:51:24 INFO - mozversion INFO | platform_repository: https://hg.mozilla.org/integration/fx-team 07:51:24 INFO - DEBUG : initialized firefox 07:51:24 INFO - DEBUG : command line: /builds/slave/talos-slave/test-pgo/build/application/firefox/firefox -profile /tmp/tmpjNnTOS/profile -tp file:/home/cltbld/talos-slave/test/build/venv/lib/python2.7/site-packages/talos/page_load_test/svgx/svgx.manifest -tpchrome -tpnoisy -tpcycles 1 -tppagecycles 25 07:51:31 INFO - INFO : Browser exited with error code: 11 07:51:36 INFO - INFO : FATAL ERROR: Non-local network connections are disabled and a connection attempt to self-repair.mozilla.org (54.192.119.182) was made. 07:51:36 INFO - You should only access hostnames available via the test networking proxy (if running mochitests) or from a test-specific httpd.js server (if running xpcshell tests). Browser services should be disabled or redirected to a local server. 07:51:36 INFO - DEBUG : Terminating: firefox, plugin-container, crashreporter 07:51:36 INFO - mozcrash INFO | Downloading symbols from: https://ftp-ssl.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/fx-team-linux-pgo/1425904212/firefox-39.0a1.en-US.linux-i686.crashreporter-symbols.zip 07:51:44 INFO - mozcrash INFO | Saved minidump as /builds/slave/talos-slave/test-pgo/build/blobber_upload_dir/25db96e2-9a6a-5c8d-7f85514e-2fe7f7ee.dmp 07:51:44 INFO - mozcrash INFO | Saved app info as /builds/slave/talos-slave/test-pgo/build/blobber_upload_dir/25db96e2-9a6a-5c8d-7f85514e-2fe7f7ee.extra 07:51:44 INFO - __metrics Screen width/height:1600/1200 07:51:44 INFO - colorDepth:24 07:51:44 INFO - Browser inner width/height: 1024/697 07:51:44 INFO - __metrics 07:51:44 INFO - JavaScript error: resource:///modules/WebappManager.jsm, line 48: NS_ERROR_FAILURE: Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIObserverService.removeObserver] 07:51:44 INFO - ************************* 07:51:44 INFO - A coding exception was thrown and uncaught in a Task. 07:51:44 INFO - Full message: ReferenceError: ProfileTimesAccessor is not defined 07:51:44 INFO - Full stack: this.TelemetryEnvironment._getProfile<@resource://gre/modules/TelemetryEnvironment.jsm:446:9 07:51:44 INFO - TaskImpl_run@resource://gre/modules/Task.jsm:314:40 07:51:44 INFO - TaskImpl@resource://gre/modules/Task.jsm:275:3 07:51:44 INFO - createAsyncFunction/asyncFunction@resource://gre/modules/Task.jsm:249:14 07:51:44 INFO - this.TelemetryEnvironment._doGetEnvironmentData</sections.profile@resource://gre/modules/TelemetryEnvironment.jsm:918:24 07:51:44 INFO - this.TelemetryEnvironment._doGetEnvironmentData<@resource://gre/modules/TelemetryEnvironment.jsm:931:25 07:51:44 INFO - TaskImpl_run@resource://gre/modules/Task.jsm:314:40 07:51:44 INFO - TaskImpl_handleResultValue@resource://gre/modules/Task.jsm:393:7 07:51:44 INFO - TaskImpl_run@resource://gre/modules/Task.jsm:322:13 07:51:44 INFO - TaskImpl_handleResultValue@resource://gre/modules/Task.jsm:393:7 07:51:44 INFO - TaskImpl_run@resource://gre/modules/Task.jsm:322:13 07:51:44 INFO - TaskImpl@resource://gre/modules/Task.jsm:275:3 07:51:44 INFO - createAsyncFunction/asyncFunction@resource://gre/modules/Task.jsm:249:14 07:51:44 INFO - this.TelemetryEnvironment.getEnvironmentData@resource://gre/modules/TelemetryEnvironment.jsm:904:25 07:51:44 INFO - assemblePing@resource://gre/modules/TelemetryPing.jsm:348:14 07:51:44 INFO - savePendingPings@resource://gre/modules/TelemetryPing.jsm:443:12 07:51:44 INFO - this.TelemetryPing<.savePendingPings@resource://gre/modules/TelemetryPing.jsm:191:12 07:51:44 INFO - savePendingPingsClassic@resource://gre/modules/TelemetrySession.jsm:1302:12 07:51:44 INFO - savePendingPings/<@resource://gre/modules/TelemetrySession.jsm:1286:37 07:51:44 INFO - Handler.prototype.process@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:867:23 07:51:44 INFO - this.PromiseWalker.walkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:746:7 07:51:44 INFO - this.PromiseWalker.scheduleWalkerLoop/<@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:688:37 07:51:44 INFO - Spinner.prototype.observe@resource://gre/modules/AsyncShutdown.jsm:464:9 07:51:44 INFO - ************************* 07:51:44 ERROR - JavaScript error: resource:///modules/CustomizableUI.jsm, line 1568: TypeError: aWindowPalette is undefined 07:51:44 INFO - FATAL ERROR: Non-local network connections are disabled and a connection attempt to self-repair.mozilla.org (54.192.119.182) was made. 07:51:44 INFO - You should only access hostnames available via the test networking proxy (if running mochitests) or from a test-specific httpd.js server (if running xpcshell tests). Browser services should be disabled or redirected to a local server. 07:51:44 INFO - PROCESS-CRASH | tsvgx | application crashed [@ nsSocketTransport::InitiateSocket()]
Flags: needinfo?(glind)
Summary: Intermittent tsvgx | application crashed [@ nsSocketTransport::InitiateSocket()] → Intermittent tsvgx,tresize | application crashed [@ nsSocketTransport::InitiateSocket()]
Summary: Intermittent tsvgx,tresize | application crashed [@ nsSocketTransport::InitiateSocket()] → Intermittent tsvgx,tresize,1d8fe559384e,tp5o_scroll | application crashed [@ nsSocketTransport::InitiateSocket()]
Not sure what info we need from glind (we don't seem to have actually asked him any question), but the failure is "caused" by bug 1138323, fixed by bug 1138823, deployed by bug 1139328, but because we're both unable to actually successfully update talos and apparently unwilling to take that inability sufficiently seriously, I'm going to have to disable every slave which hits this. The last time we did this, or perhaps the time before last, we then did without the services of those slaves for about six months. Let's try needinfo with an actual question: jmaher, where's that bug about making talos updates actually work, and what is blocking it from happening?
Component: General → Talos
Flags: needinfo?(glind) → needinfo?(jmaher)
Product: Core → Testing
Summary: Intermittent tsvgx,tresize,1d8fe559384e,tp5o_scroll | application crashed [@ nsSocketTransport::InitiateSocket()] → Only slaves broken by our inability to update talos hit tsvgx,tresize,tp5o_scroll | application crashed [@ nsSocketTransport::InitiateSocket()]
Depends on: 1141416
Turns out that's not the question to ask either, since I'm disabling the exact same set of slaves we disabled last August for failing to update talos, and just finally reimaged in February, so apparently I want to ask someone somewhere what's broken in our linux talos slave image, which causes it to create slaves which fail to update talos.
Flags: needinfo?(jmaher)
the talos update thing is related to mozharness and python packaging- this looks to be much different.
(I am here. Adding Alessio (:dexter), who actually the the Firefox side of this. https://s-r.m.o. *should be live and reachable*. If something in the tests is wrong, happy to work on it! Forgive our ignorance on all aspects of pref shimming for tests! )
[Mass Closure] Closing bug as the WORKSFORME as the intermittent failure has not been seen for 45+ days If this has been closed and you feel that it should Not have been closed, please reopen and add [leave open] to the whiteboard.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WORKSFORME
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Whiteboard: [leave open]
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.