Closed
Bug 1028816
Opened 10 years ago
Closed 10 years ago
Segmentation fault on gaia try server with pull request from bug 1017490
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: yurenju, Assigned: jgriffin)
References
Details
Attachments
(1 file)
1019 bytes,
patch
|
jhford
:
review+
jgriffin
:
checked-in+
|
Details | Diff | Splinter Review |
I got Segmentation fault when pushing pull request to gaia try server.
https://tbpl.mozilla.org/?tree=Gaia-Try&rev=6c6b3548ffe0f9b0950c055ed657e91264607907
log for G (gaia unit test):
> 20:28:27 INFO - /builds/slave/test/gaia/xulrunner-sdk/bin/run-mozilla.sh /builds/slave/test/gaia/xulrunner-sdk/bin/xpcshell build/make_gaia_shared.js
> 20:28:30 INFO - make[1]: Leaving directory `/builds/slave/test/gaia/apps/email'
> 20:28:30 INFO - copy verticalhome to build_stage/
> 20:28:30 INFO - execute verticalhome/build/build.js
> 20:28:30 INFO - run-js-command verticalhome/app/build
> 20:28:31 INFO - copy system to build_stage/
> 20:28:31 INFO - execute system/build/build.js
> 20:28:31 INFO - run-js-command system/app/build
> 20:28:31 INFO - copy gallery to build_stage/
> 20:28:31 INFO - execute gallery/build/build.js
> 20:28:31 INFO - run-js-command gallery/app/build
> 20:28:32 INFO - copy clock to build_stage/
> 20:28:32 INFO - execute clock/build/build.js
> 20:28:32 INFO - run-js-command clock/app/build
> 20:28:32 INFO - /bin/bash: line 1: 2686 Segmentation fault /builds/slave/test/gaia/xulrunner-sdk/bin/run-mozilla.sh /builds/slave/test/gaia/xulrunner-sdk/bin/xpcshell -f "/builds/slave/test/gaia/build/xpcshell-commonjs.js" -e "run('app/build');"
> 20:28:32 INFO - make: *** [clock] Error 139
> 20:28:32 ERROR - Return code: 2
> 20:28:32 ERROR - 2 not in success codes: [0]
> 20:28:32 FATAL - Halting on failure while running ['make']
> 20:28:32 FATAL - Running post_fatal callback...
> 20:28:32 FATAL - Exiting 2
after investigating I believe this is a mozharness issue, I extracted a snippet[1] from mozharness/base/script.py@run_command[2] with same arguments
you can download that gist, change "cwd" to your gaia path and execute it then you will get Segmentation fault on linux64 box.
[1] https://gist.github.com/yurenju/4abf50117c48478288d4
[2] http://hg.mozilla.org/build/mozharness/file/3347b848256c/mozharness/base/script.py#l688
Assignee | ||
Comment 1•10 years ago
|
||
It's hard to say where the actual bug is; I don't think it's a mozharness bug, although it's possible we may be able to work around it in mozharness. It's likely a 'make' bug, but it could also be an xpcshell bug or a Python bug.
The symptoms: On linux64 at least, using subprocess.Popen to invoke a make call which in turn invokes xpcshell to run some JS causes a segfault. This doesn't occur on OSX. Switching subprocess.Popen to subprocess.call avoids the segfault, but if we switched that in mozharness, we'd lose timeout handling.
Cc'ing a few people in case they have any ideas about what's going on here.
Comment 2•10 years ago
|
||
I can reproduce the segfault on Fedora 20. The segfault is definately happening outside of python, but it's being triggered by how subprocess.py invokes the command when there's an env param to Popen().
The relevant section of subproces.py:
if env is None:
os.execvp(executable, args)
else:
os.execvpe(executable, args, env)
If you modify that script to omit completely the env= param but set those environment variables on the command line, the script works (e.g. DESKTOP=0 DESKTOP_SHIMS=1 DEBUG=1 NOFTU=1 python gaia-make.py). The difference in code path here is that without env=, we're using execvp instead of execvpe. If we switch both cases to use execvpe, we still segfault.
if env is None:
os.execvpe(executable, args, {})
else:
os.execvpe(executable, args, env)
You'll still get the segfaults regardless of whether or not you specify an env kwarg.
Switching to .call is *not* fixing this. I just verified that there is still a coredump happening. Using .call just doesn't print the segfault notice because that segfault is in a subprocess, not python itself. On Fedora, you need to run 'ulimit -c unlimited' to get core dumps created in the cwd.
I've also verified that the process that's getting a coredump is Xulrunner:
jhford-w520:~/b2g/gaia $ file core.18553
core.18553: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from '/home/jhford/b2g/gaia/xulrunner-sdk-30/xulrunner-sdk/bin/xpcshell -f /home/jhfo'
Comment 3•10 years ago
|
||
Reduced test case:
import os
#os.execvp('/usr/bin/make', ['make', 'preferences']);
os.execvpe('/usr/bin/make', ['make', 'preferences'], {});
#os.execlp('/usr/bin/make', 'make', 'preferences');
#os.execlpe('/usr/bin/make', 'make', 'preferences', {});
execvp and execl both avoid segfaulting, both execvpe and execlpe segfault.
Assignee | ||
Comment 4•10 years ago
|
||
I'll change the way we invoke the tests to pass the env on the command-line and see if this goes away.
Assignee: nobody → jgriffin
Assignee | ||
Comment 5•10 years ago
|
||
I think this should have the desired effect.
Attachment #8445565 -
Flags: review?(jhford)
Comment 6•10 years ago
|
||
Comment on attachment 8445565 [details] [diff] [review]
Pass env variables on the command-line,
Looks good to me! Can we add:
self.info('Sending environment as make vars because of bug 1028816')
Because I'm sure that eventually, someone will wonder why on earth the environment they are setting is not being set as the actual environment.
Other than that, the only issue I see is that the copy and paste argv output from mozharness could be broken by an environment var with a space.
Attachment #8445565 -
Flags: review?(jhford) → review+
Assignee | ||
Comment 7•10 years ago
|
||
Comment on attachment 8445565 [details] [diff] [review]
Pass env variables on the command-line,
Addressed review comments: https://hg.mozilla.org/build/mozharness/rev/c4e0534d7b2e
Attachment #8445565 -
Flags: checked-in+
Reporter | ||
Comment 8•10 years ago
|
||
:jgriffin, I saw the commit has been checked in in mozharness repository[1] but we still get segfault on try server, do we use the same version as the repository on try server?
[1] https://hg.mozilla.org/build/mozharness/summary
Flags: needinfo?(jgriffin)
Comment 9•10 years ago
|
||
So there are two issues at play here:
1) we use the production branch of mozharness on Gaia-Try most of the time
2) we are using hg.m.o/users/jford_mozilla.com/mozharness on the default branch because there is no staging environment and I need to test a mozharness+gaia-try change
Can you link to a log of the new failure? I'm curious if something lower level than the patch jgriffin worked on is doing something like "def func(env={})" or "if not env: env = {}" or some sort of similar magic.
Assignee | ||
Comment 10•10 years ago
|
||
(In reply to Yuren [:yurenju] from comment #8)
> :jgriffin, I saw the commit has been checked in in mozharness repository[1]
> but we still get segfault on try server, do we use the same version as the
> repository on try server?
>
> [1] https://hg.mozilla.org/build/mozharness/summary
The patch has been merged to jford_mozilla.com/mozharness, so should be used for Gaia-Try runs. If it's not working, there may be some other problem involved. As jhford said, can you provide a link to a failing log?
Flags: needinfo?(jgriffin)
Reporter | ||
Comment 11•10 years ago
|
||
:jgriffin and :jhford, thanks for you help and the pull request looks good now! :D
https://tbpl.mozilla.org/?rev=632529bd820393ff1a16ec29cfd78d5478803ce6&tree=Gaia-Try
Reporter | ||
Comment 12•10 years ago
|
||
close this bug, thanks all!
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Updated•7 years ago
|
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Updated•5 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•