Closed Bug 1338651 Opened 8 years ago Closed 7 years ago

taskcluster cross-compiled OS X builds create perf problems when not stripped (Talos performance regressions vs. buildbot builds)

Categories

(Firefox Build System :: General, defect)

53 Branch
defect
Not set
normal

Tracking

(firefox-esr52 wontfix, firefox54 wontfix, firefox55 wontfix, firefox56 fixed, firefox57 fixed)

RESOLVED FIXED
mozilla56
Tracking Status
firefox-esr52 --- wontfix
firefox54 --- wontfix
firefox55 --- wontfix
firefox56 --- fixed
firefox57 --- fixed

People

(Reporter: jmaher, Assigned: wcosta)

References

Details

(Keywords: perf, regression, talos-regression)

User Story

Taskcluster OS X builds are cross-compiled from a Linux Docker image. The build is done in a path under '/home/', which shows up in the symbol table of the resulting binary as STAB entries referencing the object files. Some system libraries on OS X will attempt to stat the files in those entries, which can cause noticeable performance issues. This has shown up as a result of the sandboxing system causing this behavior while reporting violations, which caused Talos regressions, as well as timeouts in GTest death tests due to the system crash reporting system causing this behavior.

Attachments

(4 files, 3 obsolete files)

Talos has detected a Firefox performance regression from push 9f402afa3864572af68f27eb3e16295da02841b4. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

 34%  sessionrestore_no_auto_restore osx-10-10 opt e10s     968.42 -> 1293.5
 32%  sessionrestore osx-10-10 opt e10s                     928.83 -> 1230.25
 31%  damp summary osx-10-10 opt e10s                       300.83 -> 395.58
 31%  tsvgr_opacity summary osx-10-10 opt e10s              351.47 -> 460.9
 29%  a11yr summary osx-10-10 opt e10s                      469.52 -> 607.59
 28%  ts_paint osx-10-10 opt e10s                           1223 -> 1570.58
 27%  tart summary osx-10-10 opt e10s                       11.87 -> 15.07
 23%  cart summary osx-10-10 opt e10s                       36.84 -> 45.37
 22%  kraken summary osx-10-10 opt e10s                     1604.32 -> 1963
 19%  tscrollx summary osx-10-10 opt e10s                   3.16 -> 3.74
 17%  tsvgx summary osx-10-10 opt e10s                      445.94 -> 522.67
 17%  tabpaint summary osx-10-10 opt e10s                   62.76 -> 73.33
 17%  tsvg_static summary osx-10-10 opt e10s                69.25 -> 80.72
 16%  tps summary osx-10-10 opt e10s                        25.65 -> 29.71
 15%  tcanvasmark summary osx-10-10 opt e10s                6273.62 -> 5335.83
 12%  tpaint osx-10-10 opt e10s                             333.73 -> 374.12
 12%  tresize osx-10-10 opt e10s                            29.28 -> 32.65

Improvements:

  2%  tp5o Main_RSS osx-10-10 opt      398745663.13 -> 388868909.48


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=5075

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
oh boy, running talos tests via BBB against a Taskcluster OSX build yields a large volume of performance regressions.  What is more scary is the tests are very noisy now, so they become less useful.
Ted, would you have thoughts on how we could figure out the perf regression by just testing against a taskcluster build vs a buildbot build.
Flags: needinfo?(ted)
This is unexpected and unfortunate! Here are the obvious things I think we should sanity check first:
* Verify that we're using the same compiler/linker flags in both builds
* Verify that we're using the same compiler/linker version in both builds
* Verify that other mozconfig build options are the same between both builds

If those all check out then my next thought would be to take one of those tests that shows a large difference and have someone run a pair of builds through it on their local machine, and ensure they can reproduce the results. If so, they can profile both builds running the test and see what jumps out.
Flags: needinfo?(ted)
one thing to think about...these are cross compiled builds on linux vs building natively on osx- will we be able to realistically compare flags/toolchains?
Yes. We're using the same compiler (clang) and same binutils (cctools) for both builds.
Do we need to revisit bug 1314154 in light of this?
possibly, that difference will turn up when we start working on this.
Can we have a native build with disabled startup cache to measure the influence of it?
Flags: needinfo?(jmaher)
great idea, I have no clue how we even messed up the startup cache.  Ted, do you know how to do this?

I would do a baseline build:
try -b o -p macosx64 -u none -t all --rebuild 5

then a startup cache change and repeat:
try -b o -p macosx64 -u none -t all --rebuild 5

finally compare both.
Flags: needinfo?(jmaher) → needinfo?(ted)
We just chatted about this on IRC, but the thing that the TC builds don't do is *pre-generate* the startupcache during packaging, because they can't run the xpcshell that was built. To force a buildbot build to act that way you could just change this conditional to `if False:` in a try build:
https://dxr.mozilla.org/mozilla-central/rev/e1135c6fdc9bcd80d38f7285b269e030716dcb72/toolkit/mozapps/installer/packager.py#375
Flags: needinfo?(ted)
:jmaher, what is the special try syntax to run talos via BBB against a Taskcluster OSX build (instead of the native buildbot build/job)?
Flags: needinfo?(jmaher)
I believe you need a special patch, possibly wcosta has prior art to share?
Flags: needinfo?(jmaher) → needinfo?(wcosta)
(In reply to Joel Maher ( :jmaher) from comment #12)
> I believe you need a special patch, possibly wcosta has prior art to share?

You need to uncomment macosx64/opt inside test-platforms.yml
Flags: needinfo?(wcosta)
thanks to :wcosta, we have a push that disable startup-cache for buildbot and we have talos data, we can compare against the baseline with it enabled:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&originalRevision=34c6c2f302e7b48e3ad2cec575cbd34d423a9d32&newProject=try&newRevision=1b40ec069ce9226e107a4a3eda505bc9b08db6ac&framework=1&filter=osx&showOnlyImportant=0

there are no changes here- that indicates we are not seeing any talos issues with that.
That is good to know, thanks wcosta! I think we need to work through my checklist in comment 3.
One more thing that came to my mind. I added a x-build libc++ in bug 1273981, assuming it links statically, maybe it is not an optimal build. One more thing to keep eyes on.
keep in mind this is only an e10s issue, not non e10s, not sure why- so we need to look at the e10s tests (and a note, we can reduce try usage in the future, and focus on specific tests for the time being ( try: -b o -p macosx64 -u none -t chromez-e10s,other-e10s )


There are still issues with the profiler and talos, but the other-e10s job results in a tabpaint.zip profile (ignore all the others, it is invalid).

bb: http://mozilla-releng-blobs.s3.amazonaws.com/blobs/Try/sha512/8ad17a929b724b4dbee024d15460cc84d41871c646b095413e4cc8fbc69f8a2a3a155e132ae397c1a56a89c05d16775a531b4b0992c2ccdd91010f49c55bbe3e
tc: http://mozilla-releng-blobs.s3.amazonaws.com/blobs/Try/sha512/419d390d79e1bddd271eeb5d5165d15a4077a5d9374abf1cddebc4e19fc24cd0b64338df833f72c027b20e926d172396045d08917c91f55f1d24a4a2c7df45c0

I download these locally, unpack, then load the cycle_0.profile file into https://cleopatra.io/

looking at this, I see that BB is 1100ms faster than TC builds in starting the DOM work on the child process.  That seems odd.

In the child process, I see:
BB: 383 calls to mozilla::ipc::messagepump::run(), 1 call to nsTimer::cancel
TC: 440 calls to mozilla::ipc::messagepump::run()

odd that we call nsTimer::cancel for BB, maybe this is a side effect of sampled profiling.

BB: 1 call in the nsAppShell::Run() to mozilla::dom::power::powermanagerservice::getInstance()


most time is spent:
BB: mach__msg_trap 93.8%
TC: mach__msg_trap 91.8%

I am not sure this helps at all, but possible someone else familiar with geckoprofiler can use this data and find interesting bits.
`mach__msg_trap` is basically the syscall interface to the kernel, it just means the process is waiting on something, which is pretty normal for Firefox (it usually means we're waiting in the event loop). If we're spending slightly less % time there in the TC builds it means we're spending more time using the CPU, so there's something going wrong there. The 1100ms thing is interesting--is it taking longer to start the content process in the TC build? If so that would explain a lot of this.
I put a Platform Engineer hat and started to investigate tsvgr opacity, it looks like there is not a single place that is causing performance loss. The most penalized part seems to be js engine. My guess is that the x-compiler is indeed generating slower code, and js engine suffers more just because it is a top cpu bound component. I will look at tbpaint too.
that might point to compiler options
(In reply to Joel Maher ( :jmaher) from comment #21)
> that might point to compiler options

Agreed. Maybe we should verify Ted's assumption in comment #5?
Attached patch functions.diffSplinter Review
This attachment contains the list of functions present in native build libXUL not found in x-compiled libXUL. One possibility is that these functions implement high performance operations using processor special instructions, and the build system queries the environment for the availability of such instructions, or just the cross build lacks some definitions to enable them. Just speculating.

I just generated the file, will investigate further.
Another info: native build uses clang 3.8.0, x-build uses clang 3.9.0. Yes, the slower version uses the newer compiler...
Ah, I looked at some samples of assembly generated code and it feels like both compiler generate the same file.
How did you generate the diff in comment 23? There are things in there that seem like the build would be entirely broken if they were actually missing.
But also, yeah, we should make both builds use the same version of clang or we're not going to have a fair comparison *at all*.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #26)
> How did you generate the diff in comment 23? There are things in there that
> seem like the build would be entirely broken if they were actually missing.

# Generate the function list for x-build
objdump -disassemble XUL | sed -e '/[^:]$/d' XUL_bbb.s | sed -e '/^$/d' -e 's/://' | sort | uniq > bbb_functions.txt

# Generate the function for native build
objdump -disassemble XUL | sed -e '/[^:]$/d' XUL_bb.s | sed -e '/^$/d' -e 's/://' | sort | uniq > bb_functions.txt

diff bbb_functions.txt bb_functions.txt > functions.diff

Does it sound right?
I have been investigating build differences, this is what I found:

MOZ_AUTOMATION_L10N_CHECK is 0 on cross and 1 on native build.

native uses clang 3.8.0, cross uses 3.9.0

native has the build option --enable-dtrace
cross has the linker flag -Wl,-dead_strip while native has -Wl,-no_data_in_code_info
These are related to bug 403132
I'm comparing the mozharness talos output logs for talos runs on native vs tc-build-x builds.

14:26:29     INFO - Using buildbot properties:
14:26:29     INFO - {
... 

The native build has:

14:26:29     INFO -     "pgo_build": "False",

But the tc-bbb-build-x doesn't have that flag listed. Probably doesn't make a difference but just noting it just in case...
from irc we were going to run damp in a reduced test locally to see if we can reproduce.  just about all the damp subtests regress:
https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=autoland&originalRevision=9f23fb946723b47b9a1e1b1c82c6c3556e0354c7&newProject=autoland&newRevision=9f402afa3864572af68f27eb3e16295da02841b4&originalSignature=9663a154caf9b713fb951c8db24518c7e0ceddce&newSignature=9663a154caf9b713fb951c8db24518c7e0ceddce&framework=1

so we should be able to narrow it down nicely.

to run damp locally, visit:
https://hg.mozilla.org/mozilla-central/file/tip/testing/talos/talos/tests/devtools/addon/devtools-signed.xpi  (go to raw and it will install)

then restart the browser.

Finally browser to:
chrome://damp/content/damp.html

and you can run in the browser.

Otherwise, hack up damp manifest to reduce the runtime and push to try :)

---------------------

from comment 29, I am curious about:
native has the build option --enable-dtrace
cross has the linker flag -Wl,-dead_strip while native has -Wl,-no_data_in_code_info
(In reply to Joel Maher ( :jmaher) from comment #31)
> Otherwise, hack up damp manifest to reduce the runtime and push to try :)

Can't we just dry it off? ;)
In order to try to reduce the regression to a single test case, I downloaded a native buildbot osx10.10 build from a try run, and a build from taskcluster/biuldbotbridge. I ran the talos DAMP test several times locally (with 25 repeats) on my mac OSX 10.11.6. I looked at all the DAMP subtests each time, and I can't seem to reproduce the regression locally consistently. Moving back to TRY.
Reducing down to Talos DAMP simple inspector test cases only doesn't reproduce the regression on try:

https://treeherder.mozilla.org/#/jobs?repo=try&author=rwood@mozilla.com&fromchange=909b9c6364ccb5a3fa7e61a495d67f9484845654
This is my last run. I disabled the "--enable-dtrace" option, both run with -Wl,dead_strip linker option and downgraded compiler to 3.8.0 in tc
Summary: 11.5 - 33.57% * (osx-10-10) regression on push 9f402afa3864572af68f27eb3e16295da02841b4 (Fri Feb 10 2017) → taskcluster cross-compiled OS X builds show Talos performance regressions vs. buildbot builds
(In reply to Wander Lairson Costa [:wcosta] from comment #29)
> native uses clang 3.8.0, cross uses 3.9.0

Before we dive into anything else we should get both of these builds on the same exact version of clang and compare *that*. It's entirely possible that we're chasing a regression that's completely due to using a newer version of clang. If that's so, we can just downgrade the taskcluster builds to the version the buildbot builds are using and let someone on platform sort out the performance issues with clang 3.9.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #38)
> (In reply to Wander Lairson Costa [:wcosta] from comment #29)
> > native uses clang 3.8.0, cross uses 3.9.0
> 
> Before we dive into anything else we should get both of these builds on the
> same exact version of clang and compare *that*. It's entirely possible that
> we're chasing a regression that's completely due to using a newer version of
> clang. If that's so, we can just downgrade the taskcluster builds to the
> version the buildbot builds are using and let someone on platform sort out
> the performance issues with clang 3.9.

In comment 37 I have results with clang 3.8
ok, so that isn't helping us then :(
for the raw perfherder data,

native:
1900.75, 1290.795, 1887.9100000000035, 1290.505000000001, 1146.380000000001, 1129.3899999999994, 1249.8050000000003, 1306.6449999999968, 1249.010000000002, 1179.9249999999956, 1124.1549999999988, 1295.449999999997, 1195.9100000000035, 1301.760000000002, 1269.8700000000026, 1195.2900000000009, 1448.510000000002, 1312.195000000007, 1306.125, 1322.9550000000017, 1132.9349999999977, 1461.8300000000017, 1141.2600000000093, 1316.804999999993, 1145.7550000000047

x-compile:
[2151.83, 1287.2200000000012, 1348.9799999999996, 1486.625, 1320.5149999999994, 1456.5900000000001, 1255.3250000000007, 1573.0049999999974, 1497.7099999999991, 1399.1199999999953, 1381.4800000000032, 1421.4649999999965, 1460.8299999999945, 1373.7699999999968, 1699.6200000000026, 1526.064999999995, 1405.7249999999985, 1386.4249999999884, 1362.8650000000052, 1524.2599999999948, 1490.074999999997, 1597.7350000000006, 1526.175000000003, 1500.3399999999965, 1472.3600000000006]


Somehow x-compile continues to be 200+ms slower than native for raw data.
Investigating the logs, the tc build has a much higher occurrence rate of the following log message:

1251 INFO - PID 1965 | ###!!! [Child][MessageChannel] Error: (msgtype=0x8E0025,name=PNecko::Msg_RemoveRequestContext)                                                               |Channel closing: too late to send/recv, messages will be lost

And this is causing delays in g2 test.
:dvander could you help figure out what's happening here?
(I randomly picked you from git blame MessageChannel.cpp :) )
Flags: needinfo?(dvander)
This usually means one endpoint of an IPC channel has sent a "shutdown" message to the other side. It is processed on the I/O thread, and then the main thread, and in between there is a window where sending messages will yield this error.

It's fairly harmless, and appears on actors that don't hand-shake their shutdown order.
Flags: needinfo?(dvander)
(In reply to Robert Wood [:rwood] from comment #41)
> Narrowing down the Talos DAMP test to a single test case (inspectorOpen,
> openToolboxAndLog only) does reproduce the regression.

Thanks, that's useful for digging in!

(In reply to Wander Lairson Costa [:wcosta] from comment #43)
> Investigating the logs, the tc build has a much higher occurrence rate of
> the following log message:
> 
> 1251 INFO - PID 1965 | ###!!! [Child][MessageChannel] Error:
> (msgtype=0x8E0025,name=PNecko::Msg_RemoveRequestContext)                    
> |Channel closing: too late to send/recv, messages will be lost
> 
> And this is causing delays in g2 test.

This is also useful info! It sounds like maybe something is not working properly at the IPC layer in the taskcluster builds.

Looking at the DAMP tests (which I only just now learned are devtools tests), the code that gets run during `openToolbox` does seem to do a lot of IPC messaging in the e10s case, so if IPC is somehow slower or broken I could absolutely believe that it would show up as a performance regression in these tests. I'll try to dig around in this area and see what I can find.
(In reply to David Anderson [:dvander] from comment #45)
> This usually means one endpoint of an IPC channel has sent a "shutdown"
> message to the other side. It is processed on the I/O thread, and then the
> main thread, and in between there is a window where sending messages will
> yield this error.
> 
> It's fairly harmless, and appears on actors that don't hand-shake their
> shutdown order.

More recent update, I see those message in both logs, but in BB, only one message shows up, which a lot shows up in tc build. My guess is that for some reason tc built binary has a higher ipc latency than bb.
I just update m-c branch, add rwood patch to narrow the problem and the error message is gone, although the performance regression still holds.
Given that this is an e10s only regression and we see a lot of ipc related messages, maybe we should look at OSX specific IPC code as a starting point?

I wouldn't know how to find out if there is osx specific code- it would be good to know either way.
There's definitely OSX-specific IPC code, like:
https://dxr.mozilla.org/mozilla-central/rev/ff04d410e74b69acfab17ef7e73e7397602d5a68/ipc/glue/moz.build#107

I don't think anything is obviously *broken* with it, though, or the builds wouldn't work. We've got green debug tests, so it's clearly working. It's possible there's just some difference here that's making it *slower*.

I ran it by froydnj on IRC and he suggested that it could be something odd like using a different version of libc++ which could cause a perf difference in low-level code like IPC.
Attached file firefox-bin.tc.s.gz (obsolete) —
Assignee: nobody → wcosta
Attachment #8849596 - Attachment is obsolete: true
See Also: → 1337955
Ok, this is what I did so far:

* Remove enable-dtrace flag in BB since it is not enabled on cross.
* Disable startup-cache in BB as well, as cross can't generate it.
  These two items were just to narrow the gap between BB and cross build flags.
* Added -Wl,-no_data_in_code_info to cross build.
* Target cross builds to darwin 11.2.0 to match BB target.
* Use yasm 1.3.0 on cross build, same version as in BB.
* Removed AC_FUNC_MEMCMP from configure, since it can't be determined at runtime in cross build and docs say it is obsolete [1]
* Set ac_cv_cpp_dynamic_cast_void_ptr to 'yes' in cross build.
* Downgrade clang to 3.8.0 in cross build (same version as BB).

None of these had any impact on performance, reminding that the performance regression only happens on e10s.

Looking at assembly code, I see the compiler generates different binary code between cross and BB builds.
For example, this function in libXUL has different generated code, although the compiler command line, except by cross build flags, are identical to both BB and cross builds.

cross:
__ZN11mp4_demuxer4Adts13ConvertSampleEtaaPN7mozilla12MediaRawDataE
    	pushq	%rbp
    	movq	%rsp, %rbp
    	pushq	%r15
    	pushq	%r14
    	pushq	%r12
    	pushq	%rbx
    	subq	$32, %rsp
    	movq	%rcx, %r12
    	movq	104(%r12), %rax
    	addq	$7, %rax
    	cmpq	$8191, %rax
    	jbe	8 <__ZN11mp4_demuxer4Adts13ConvertSampleEtaaPN7mozilla12MediaRawDataE+2B>
    	xorl	%r14d, %r14d
    	jmp	345 <__ZN11mp4_demuxer4Adts13ConvertSampleEtaaPN7mozilla12MediaRawDataE+184>
    	movzwl	%di, %ecx
    	xorl	%r14d, %r14d
    	cmpl	$15, %ecx
    	ja	330 <__ZN11mp4_demuxer4Adts13ConvertSampleEtaaPN7mozilla12MediaRawDataE+184>
    	testb	%sil, %sil
    	js	321 <__ZN11mp4_demuxer4Adts13ConvertSampleEtaaPN7mozilla12MediaRawDataE+184>
    	movb	%dl, %bl
    	decb	%bl
    	movzbl	%bl, %edi
    	cmpl	$3, %edi
    	ja	305 <__ZN11mp4_demuxer4Adts13ConvertSampleEtaaPN7mozilla12MediaRawDataE+184>
    	movw	$61951, -48(%rbp)
    	shlb	$6, %dl
    	movzbl	%dl, %edx
    	shlb	$2, %sil
    	movzbl	%sil, %esi
    	movl	%ecx, %edi
    	shrl	$2, %edi
    	addl	%esi, %edi
    	leal	192(%rdx,%rdi), %edx
    	movb	%dl, -46(%rbp)
    	shll	$6, %ecx
    	movl	%eax, %edx
    	shrl	$11, %edx
    	addl	%ecx, %edx
    	movb	%dl, -45(%rbp)
    	movq	%rax, %rcx
    	shrq	$3, %rcx
    	movb	%cl, -44(%rbp)
    	shlq	$5, %rax
    	orl	$31, %eax
    	movb	%al, -43(%rbp)
    	movb	$-4, -42(%rbp)
    	movq	(%r12), %rax
    	movq	%r12, %rdi
    	callq	*24(%rax)
    	movq	%rax, -64(%rbp)
    	movq	%rax, -40(%rbp)
    	movq	%rax, -56(%rbp)
    	movq	-56(%rbp), %rdi
    	leaq	-48(%rbp), %rsi
    	movl	$7, %edx
    	callq	26175531 <__ZN7mozilla18MediaRawDataWriter7PrependEPKhm>
    	testb	%al, %al
    	je	56 <__ZN11mp4_demuxer4Adts13ConvertSampleEtaaPN7mozilla12MediaRawDataE+101>
    	movq	64(%r12), %rax
    	movb	$1, %r14b
    	cmpb	$0, (%rax)
    	je	156 <__ZN11mp4_demuxer4Adts13ConvertSampleEtaaPN7mozilla12MediaRawDataE+176>
    	movq	24(%rax), %rax
    	movq	-56(%rbp), %rcx
    	movq	(%rcx), %r15
    	addq	$24, %r15
    	cmpl	$0, (%rax)
    	je	24 <__ZN11mp4_demuxer4Adts13ConvertSampleEtaaPN7mozilla12MediaRawDataE+106>
    	movq	(%r15), %rax
    	cmpl	$0, (%rax)
    	je	187 <__ZN11mp4_demuxer4Adts13ConvertSampleEtaaPN7mozilla12MediaRawDataE+1B5>
    	addw	$7, 8(%rax)
    	jmp	117 <__ZN11mp4_demuxer4Adts13ConvertSampleEtaaPN7mozilla12MediaRawDataE+176>
    	xorl	%r14d, %r14d
    	jmp	112 <__ZN11mp4_demuxer4Adts13ConvertSampleEtaaPN7mozilla12MediaRawDataE+176>
    	movq	(%r15), %rax
    	movl	(%rax), %esi
    	incq	%rsi
    	movl	$2, %edx
    	movq	%r15, %rdi
    	callq	126149 <__ZN13nsTArray_baseI27nsTArrayInfallibleAllocator25nsTArray_CopyWithMemutilsE14EnsureCapacityIS0_EENT_15ResultTypeProxyEmm>
    	movq	(%r15), %rax
    	movl	(%rax), %ecx
    	movw	$7, 8(%rax,%rcx,2)
    	movq	(%r15), %rax
    	cmpq	75996351(%rip), %rax
    	je	97 <__ZN11mp4_demuxer4Adts13ConvertSampleEtaaPN7mozilla12MediaRawDataE+194>
    	incl	(%rax)
    	movq	-56(%rbp), %rax
    	movq	(%rax), %rbx
    	leaq	32(%rbx), %rdi
    	movl	104(%r12), %r15d
    	addl	$-7, %r15d
    	movq	32(%rbx), %rax
    	movl	(%rax), %esi
    	incq	%rsi
    	movl	$4, %edx
    	callq	126084 <__ZN13nsTArray_baseI27nsTArrayInfallibleAllocator25nsTArray_CopyWithMemutilsE14EnsureCapacityIS0_EENT_15ResultTypeProxyEmm>
    	movq	32(%rbx), %rax
    	movl	(%rax), %ecx
    	movl	%r15d, 8(%rax,%rcx,4)
    	movq	32(%rbx), %rax
    	cmpq	75996286(%rip), %rax
    	je	32 <__ZN11mp4_demuxer4Adts13ConvertSampleEtaaPN7mozilla12MediaRawDataE+194>
    	incl	(%rax)
    	movq	-56(%rbp), %rdi
    	testq	%rdi, %rdi
    	je	5 <__ZN11mp4_demuxer4Adts13ConvertSampleEtaaPN7mozilla12MediaRawDataE+184>
    	callq	61442842
    	movb	%r14b, %al
    	addq	$32, %rsp
    	popq	%rbx
    	popq	%r12
    	popq	%r14
    	popq	%r15
    	popq	%rbp
    	retq
    	leaq	64693937(%rip), %rax
    	movq	75999454(%rip), %rcx
    	movq	%rax, (%rcx)
    	movl	$451, 0
    	callq	61438023
    	xorl	%edi, %edi
    	xorl	%esi, %esi
    	callq	549516 <__Z23InvalidArrayIndex_CRASHmm>
    	nop

bb:
__ZN11mp4_demuxer4Adts13ConvertSampleEtaaPN7mozilla12MediaRawDataE
    	pushq	%rbp
    	movq	%rsp, %rbp
    	pushq	%r15
    	pushq	%r14
    	pushq	%rbx
    	pushq	%rax
    	movq	%rcx, %r15
    	movq	104(%r15), %rax
    	addq	$7, %rax
    	cmpq	$8191, %rax
    	jbe	7 <__ZN11mp4_demuxer4Adts13ConvertSampleEtaaPN7mozilla12MediaRawDataE+24>
    	xorl	%ebx, %ebx
    	jmp	318 <__ZN11mp4_demuxer4Adts13ConvertSampleEtaaPN7mozilla12MediaRawDataE+162>
    	movzwl	%di, %edi
    	xorl	%ebx, %ebx
    	cmpl	$15, %edi
    	ja	304 <__ZN11mp4_demuxer4Adts13ConvertSampleEtaaPN7mozilla12MediaRawDataE+162>
    	testb	%sil, %sil
    	js	295 <__ZN11mp4_demuxer4Adts13ConvertSampleEtaaPN7mozilla12MediaRawDataE+162>
    	movb	%dl, %cl
    	decb	%cl
    	movzbl	%cl, %ecx
    	cmpl	$3, %ecx
    	ja	279 <__ZN11mp4_demuxer4Adts13ConvertSampleEtaaPN7mozilla12MediaRawDataE+162>
    	movw	$61951, -32(%rbp)
    	shlb	$6, %dl
    	movzbl	%dl, %ecx
    	shlb	$2, %sil
    	movzbl	%sil, %edx
    	movl	%edi, %esi
    	shrl	$2, %esi
    	addl	%edx, %esi
    	leal	192(%rcx,%rsi), %ecx
    	movb	%cl, -30(%rbp)
    	shll	$6, %edi
    	movl	%eax, %ecx
    	shrl	$11, %ecx
    	addl	%edi, %ecx
    	movb	%cl, -29(%rbp)
    	movq	%rax, %rcx
    	shrq	$3, %rcx
    	movb	%cl, -28(%rbp)
    	shlq	$5, %rax
    	orl	$31, %eax
    	movb	%al, -27(%rbp)
    	movb	$-4, -26(%rbp)
    	movq	(%r15), %rax
    	movq	%r15, %rdi
    	callq	*24(%rax)
    	movq	%rax, %r14
    	leaq	-32(%rbp), %rsi
    	movl	$7, %edx
    	movq	%r14, %rdi
    	callq	25502366 <__ZN7mozilla18MediaRawDataWriter7PrependEPKhm>
    	testb	%al, %al
    	je	46 <__ZN11mp4_demuxer4Adts13ConvertSampleEtaaPN7mozilla12MediaRawDataE+E4>
    	movq	64(%r15), %rax
    	movb	$1, %bl
    	cmpb	$0, (%rax)
    	je	37 <__ZN11mp4_demuxer4Adts13ConvertSampleEtaaPN7mozilla12MediaRawDataE+E6>
    	movq	24(%rax), %rax
    	movq	(%r14), %rbx
    	addq	$24, %rbx
    	cmpl	$0, (%rax)
    	je	28 <__ZN11mp4_demuxer4Adts13ConvertSampleEtaaPN7mozilla12MediaRawDataE+ED>
    	movq	(%rbx), %rax
    	cmpl	$0, (%rax)
    	je	179 <__ZN11mp4_demuxer4Adts13ConvertSampleEtaaPN7mozilla12MediaRawDataE+190>
    	addw	$7, 8(%rax)
    	jmp	116 <__ZN11mp4_demuxer4Adts13ConvertSampleEtaaPN7mozilla12MediaRawDataE+158>
    	xorl	%ebx, %ebx
    	testq	%r14, %r14
    	jne	111 <__ZN11mp4_demuxer4Adts13ConvertSampleEtaaPN7mozilla12MediaRawDataE+15A>
    	jmp	117 <__ZN11mp4_demuxer4Adts13ConvertSampleEtaaPN7mozilla12MediaRawDataE+162>
    	movq	(%rbx), %rax
    	movl	(%rax), %esi
    	incq	%rsi
    	movl	$2, %edx
    	movq	%rbx, %rdi
    	callq	125662 <__ZN13nsTArray_baseI27nsTArrayInfallibleAllocator25nsTArray_CopyWithMemutilsE14EnsureCapacityIS0_EENT_15ResultTypeProxyEmm>
    	movq	(%rbx), %rax
    	movl	(%rax), %ecx
    	movw	$7, 8(%rax,%rcx,2)
    	movq	(%rbx), %rax
    	cmpq	73332912(%rip), %rax
    	je	85 <__ZN11mp4_demuxer4Adts13ConvertSampleEtaaPN7mozilla12MediaRawDataE+16F>
    	incl	(%rax)
    	movq	(%r14), %rbx
    	leaq	32(%rbx), %rdi
    	movl	104(%r15), %r15d
    	addl	$-7, %r15d
    	movq	32(%rbx), %rax
    	movl	(%rax), %esi
    	incq	%rsi
    	movl	$4, %edx
    	callq	125602 <__ZN13nsTArray_baseI27nsTArrayInfallibleAllocator25nsTArray_CopyWithMemutilsE14EnsureCapacityIS0_EENT_15ResultTypeProxyEmm>
    	movq	32(%rbx), %rax
    	movl	(%rax), %ecx
    	movl	%r15d, 8(%rax,%rcx,4)
    	movq	32(%rbx), %rax
    	cmpq	73332852(%rip), %rax
    	je	25 <__ZN11mp4_demuxer4Adts13ConvertSampleEtaaPN7mozilla12MediaRawDataE+16F>
    	incl	(%rax)
    	movb	$1, %bl
    	movq	%r14, %rdi
    	callq	59100812
    	movb	%bl, %al
    	addq	$8, %rsp
    	popq	%rbx
    	popq	%r14
    	popq	%r15
    	popq	%rbp
    	retq
    	leaq	62152870(%rip), %rax
    	movq	73336107(%rip), %rcx
    	movq	%rax, (%rcx)
    	movl	$451, 0
    	callq	59096014
    	xorl	%edi, %edi
    	xorl	%esi, %esi
    	callq	543777 <__Z23InvalidArrayIndex_CRASHmm>
    	nopl	(%rax)

If you want to look by yourself, download the dmg image for both BB and taskcluster to a local directory and run this script [2] there.
My patches can be found here [3] (not all patches are fixes, some are just for tests).

NI people that I think might have some ideas.

[1] https://www.gnu.org/software/autoconf/manual/autoconf-2.60/html_node/Particular-Functions.html
[2] https://gist.github.com/walac/11b9579378ee83f1eedd40b8d402768c
[3] https://github.com/walac/gecko-dev/commits/mac-talos
Flags: needinfo?(nfroyd)
Flags: needinfo?(jmathies)
Total shot in the dark but considering you pasted a piece of the mp4_demuxer code, I wonder if this is related to rust using a different linker in cross builds vs BB builds? I don't know why it would only affect e10s though. There are details on the linker difference in bug 1342503 comment 8 and in bug 1329737.
There are some minor register allocation differences, but those shouldn't be significant enough to affect talos results.

I do see obvious cases where the cross code is much worse:

bb:
        callq    *24(%rax)
        movq    %rax, %r14
        leaq    -32(%rbp), %rsi
        movl    $7, %edx
        movq    %r14, %rdi

cross:
        callq    *24(%rax)
        movq    %rax, -64(%rbp)
        movq    %rax, -40(%rbp)
        movq    %rax, -56(%rbp)
        movq    -56(%rbp), %rdi
        leaq    -48(%rbp), %rsi
        movl    $7, %edx

What are all those movq's doing in there?!  That extra memory traffic can't be helpful for performance.

My next suggestion would be to try and compare the preprocessed source files wherever possible; the file containing the above code:

http://dxr.mozilla.org/mozilla-central/source/media/libstagefright/binding/Adts.cpp

might be a decent place to start.  Then you can carefully determine if differences in the preprocessed code (ruling out things like #line directives that are going to obviously differ due to path differences) are affecting the code(s) in question.  That would confirm or rule out hypotheses like comment 50.

OTOH, it's possible that clang/LLVM just generate different code in native vs. cross configurations due to Reasons.  Obviously you don't want your compiler to do that, but it's not easy to test for...
Oh, and possibly easier than staring at preprocessed code...given that all of the problems are e10s related, maybe there's something about the binary layout of our libraries/executables that the OS X loader doesn't like, or something?  Should check whether Mach-O headers, etc. etc. look reasonable when comparing the two builds.
Flags: needinfo?(nfroyd)
Or given that e10s is a problem, perhaps trawling through preprocessed sources for code under ipc/ would yield some interesting results.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #53)
> Total shot in the dark but considering you pasted a piece of the mp4_demuxer
> code, I wonder if this is related to rust using a different linker in cross
> builds vs BB builds? I don't know why it would only affect e10s though.
> There are details on the linker difference in bug 1342503 comment 8 and in
> bug 1329737.

Is there an easy way to verify that in a loaner?
Flags: needinfo?(bugmail)
(In reply to Nathan Froyd [:froydnj] from comment #56)
> Or given that e10s is a problem, perhaps trawling through preprocessed
> sources for code under ipc/ would yield some interesting results.

Look at preprocessed files under libstagefright and ipc, they are equal. About Mach-O, I have given a look, there are some different values in some fields, but nothing I can tell would lead to a regression. However, I am not a Mach-O expert.
(In reply to Wander Lairson Costa [:wcosta] from comment #57)
> Is there an easy way to verify that in a loaner?

On the buildbot loaner you can force rust/cargo to use a non-system-linker using the workaround in bug 1342503 comment 12. However you might need to do something similar on the TC loaner to make *it* also use a known linker, I'm not sure. I think that on both builds it will try to use the system linker, and the buildbot slave just has an older system linker.
Flags: needinfo?(bugmail)
That shouldn't matter--if it was using the wrong linker the build would just fail because the system linker is an ELF linker and it needs a Mach-O linker. The code in libxul gets linked by the Mozilla build system, not cargo.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #46)
> (In reply to Robert Wood [:rwood] from comment #41)
> > Narrowing down the Talos DAMP test to a single test case (inspectorOpen,
> > openToolboxAndLog only) does reproduce the regression.
> 
> Thanks, that's useful for digging in!
> 
> (In reply to Wander Lairson Costa [:wcosta] from comment #43)
> > Investigating the logs, the tc build has a much higher occurrence rate of
> > the following log message:
> > 
> > 1251 INFO - PID 1965 | ###!!! [Child][MessageChannel] Error:
> > (msgtype=0x8E0025,name=PNecko::Msg_RemoveRequestContext)                    
> > |Channel closing: too late to send/recv, messages will be lost
> > 
> > And this is causing delays in g2 test.
> 
> This is also useful info! It sounds like maybe something is not working
> properly at the IPC layer in the taskcluster builds.
> 
> Looking at the DAMP tests (which I only just now learned are devtools
> tests), the code that gets run during `openToolbox` does seem to do a lot of
> IPC messaging in the e10s case, so if IPC is somehow slower or broken I
> could absolutely believe that it would show up as a performance regression
> in these tests. I'll try to dig around in this area and see what I can find.

I hate to say it but differences in performance can trigger these. For example you tend to experience more IPC actor tear down errors in debug builds due to the changes in execution timing of communicating processes.
Flags: needinfo?(jmathies)
Given that all evidence here points to IPC, let's move this to a better component.

Has anyone tried running make *.i on some of the source files in the IPC code in a cross build vs a normal build to make sure that we're actually compiling the same code in both configurations?  I would probably start my investigations there...
Component: Untriaged → IPC
Product: Firefox → Core
(In reply to :Ehsan Akhgari (super long backlog, slow to respond) from comment #62)
> Given that all evidence here points to IPC, let's move this to a better
> component.
> 
> Has anyone tried running make *.i on some of the source files in the IPC
> code in a cross build vs a normal build to make sure that we're actually
> compiling the same code in both configurations?  I would probably start my
> investigations there...

If you mean compared the generated source code, yes, I did, they are pretty much the same
As requested by :catlee, here are the steps I followed to compare pre-compiled source code:

1. Create a special build that uploads obj-dir as an artifact, both for TC and BB builds.
2. Download and unpack both obj-dir's.
3. Remove all object/binary files.
4. Strip file paths specific to the build machine.
5. Perform a recursive diff between both trees.

No practical differences found, apart for some newlines.
(In reply to Wander Lairson Costa [:wcosta] from comment #64)
> No practical differences found, apart for some newlines.

There should be *no* differences at all in the source that we compiled (sans whitespace that is).  Does diff -w show anything?
(In reply to :Ehsan Akhgari (super long backlog, slow to respond) from comment #65)
> (In reply to Wander Lairson Costa [:wcosta] from comment #64)
> > No practical differences found, apart for some newlines.
> 
> There should be *no* differences at all in the source that we compiled (sans
> whitespace that is).  Does diff -w show anything?

I didn't try that, only manual check.
Attached patch ipc.diff (obsolete) — Splinter Review
This is the diff of obj-dir/ipc directory for tc and bb builds.
(In reply to :Ehsan Akhgari (super long backlog, slow to respond) from comment #65)
> (In reply to Wander Lairson Costa [:wcosta] from comment #64)
> > No practical differences found, apart for some newlines.
> 
> There should be *no* differences at all in the source that we compiled (sans
> whitespace that is).  Does diff -w show anything?

I attached the diff file for ipc subdir. The object directories themselves are quite big, but I can find a way to upload them if you want.
Flags: needinfo?(ehsan)
Please generate a unified diff.
Flags: needinfo?(ehsan)
Attached file obj.diff.gz (obsolete) —
I assume that by "unified diff" you mean the whole obj-dir tree. Please find it attached.
(In reply to Wander Lairson Costa [:wcosta] from comment #70)
> Created attachment 8862822 [details]
> obj.diff.gz
> 
> I assume that by "unified diff" you mean the whole obj-dir tree. Please find
> it attached.

By "unified diff", he means including the -u option when you run |diff|, which results in much more readable diffs.
Flags: needinfo?(wcosta)
Attachment #8862822 - Attachment is obsolete: true
Flags: needinfo?(wcosta)
Attachment #8862474 - Attachment is obsolete: true
Attached patch ipc.diffSplinter Review
(In reply to Nathan Froyd [:froydnj] from comment #71)
> (In reply to Wander Lairson Costa [:wcosta] from comment #70)
> > Created attachment 8862822 [details]
> > obj.diff.gz
> > 
> > I assume that by "unified diff" you mean the whole obj-dir tree. Please find
> > it attached.
> 
> By "unified diff", he means including the -u option when you run |diff|,
> which results in much more readable diffs.

TIL unified diff
This diff isn't actually comparing the preprocessed sources, which is what comment 62 is asking for.  It's just comparing any .cpp files that might be generated as part of the build process and the dependency files generated by the compiler.  We already know the former shouldn't differ, and the latter do indeed have trivial differences.

Please try building with -save-temps in the ipc/ directory and all of its subdirectories (this will generate a lot of files), and diffing the .i files generated.  Don't bother with diffing the .s files; we already know they differ.
Flags: needinfo?(wcosta)
Note that you may have to modify parts of paths in the resulting .i files in e.g. #line directives so you don't get *massive* amounts of differences, depending on how the build machines are set up.  That's OK, so long as you're systematic about it and you explain how you got to the results you did.  If different #line directives are including different files entirely (e.g. libstdc++ vs. libc++ or something like that), those sorts of differences should *not* be modified away.
See Also: → 1365993
Wander: :mshal is going to try to help you out with this next week. 

I know it's been a while since you had these builds setup for diffing, but if you can try to resurrect that env for next week, it will help you two make quicker progress.
(In reply to Nathan Froyd [:froydnj] from comment #74)
> This diff isn't actually comparing the preprocessed sources, which is what
> comment 62 is asking for.  It's just comparing any .cpp files that might be
> generated as part of the build process and the dependency files generated by
> the compiler.  We already know the former shouldn't differ, and the latter
> do indeed have trivial differences.
> 
> Please try building with -save-temps in the ipc/ directory and all of its
> subdirectories (this will generate a lot of files), and diffing the .i files
> generated.  Don't bother with diffing the .s files; we already know they
> differ.

Sorry for the extremely long delay. When I try to build to generate intermediate files, I get errors both on BB and TC. I am going to look into, but if anybody has an idea what's going on, would be much appreciated.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=b30bea36b764&selectedJob=102903839
Flags: needinfo?(wcosta)
(In reply to Wander Lairson Costa [:wcosta] from comment #77)
> Sorry for the extremely long delay. When I try to build to generate
> intermediate files, I get errors both on BB and TC. I am going to look into,
> but if anybody has an idea what's going on, would be much appreciated.
> 
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=b30bea36b764&selectedJob=102903839

Can't say I understand why, but building with '-save-temps' seems to flag additional warnings that aren't present without the flag. I can reproduce an example locally:

$ clang ... Unified_cpp_test_testplugin0.cpp
(no warnings)
$ clang ... -save-temps Unified_cpp_test_testplugin0.cpp
In file included from /Users/marf/mozilla-central-git/obj-x86_64-apple-darwin15.6.0/dom/plugins/test/testplugin/Unified_cpp_test_testplugin0.cpp:2:
/Users/marf/mozilla-central-git/dom/plugins/test/testplugin/nptest.cpp:2305:31: warning: equality comparison with extraneous
      parentheses [-Wparentheses-equality]
    if (((objectVariant).type == NPVariantType_Object)) {
         ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~
/Users/marf/mozilla-central-git/dom/plugins/test/testplugin/nptest.cpp:2305:31: note: remove extraneous parentheses around the
      comparison to silence this warning
    if (((objectVariant).type == NPVariantType_Object)) {
        ~                     ^                      ~
/Users/marf/mozilla-central-git/dom/plugins/test/testplugin/nptest.cpp:2305:31: note: use '=' to turn this equality comparison into
      an assignment
    if (((objectVariant).type == NPVariantType_Object)) {
                              ^~
                              =
/Users/marf/mozilla-central-git/dom/plugins/test/testplugin/nptest.cpp:2963:26: warning: equality comparison with extraneous
      parentheses [-Wparentheses-equality]
      if (((result).type == NPVariantType_String)) {
           ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~
/Users/marf/mozilla-central-git/dom/plugins/test/testplugin/nptest.cpp:2963:26: note: remove extraneous parentheses around the
      comparison to silence this warning
      if (((result).type == NPVariantType_String)) {
          ~              ^                      ~
/Users/marf/mozilla-central-git/dom/plugins/test/testplugin/nptest.cpp:2963:26: note: use '=' to turn this equality comparison into
      an assignment
      if (((result).type == NPVariantType_String)) {
                         ^~
                         =
In file included from /Users/marf/mozilla-central-git/obj-x86_64-apple-darwin15.6.0/dom/plugins/test/testplugin/Unified_cpp_test_testplugin0.cpp:20:
/Users/marf/mozilla-central-git/dom/plugins/test/testplugin/nptest_utils.cpp:50:23: warning: equality comparison with extraneous
      parentheses [-Wparentheses-equality]
  if (((variant).type == NPVariantType_String))
       ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~
/Users/marf/mozilla-central-git/dom/plugins/test/testplugin/nptest_utils.cpp:50:23: note: remove extraneous parentheses around the
      comparison to silence this warning
  if (((variant).type == NPVariantType_String))
      ~               ^                      ~
/Users/marf/mozilla-central-git/dom/plugins/test/testplugin/nptest_utils.cpp:50:23: note: use '=' to turn this equality comparison
      into an assignment
  if (((variant).type == NPVariantType_String))
                      ^~
                      =
/Users/marf/mozilla-central-git/dom/plugins/test/testplugin/nptest_utils.cpp:52:28: warning: equality comparison with extraneous
      parentheses [-Wparentheses-equality]
  else if (((variant).type == NPVariantType_Int32))
            ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~
/Users/marf/mozilla-central-git/dom/plugins/test/testplugin/nptest_utils.cpp:52:28: note: remove extraneous parentheses around the
      comparison to silence this warning
  else if (((variant).type == NPVariantType_Int32))
           ~               ^                     ~
/Users/marf/mozilla-central-git/dom/plugins/test/testplugin/nptest_utils.cpp:52:28: note: use '=' to turn this equality comparison
      into an assignment
  else if (((variant).type == NPVariantType_Int32))
                           ^~
                           =
/Users/marf/mozilla-central-git/dom/plugins/test/testplugin/nptest_utils.cpp:54:28: warning: equality comparison with extraneous
      parentheses [-Wparentheses-equality]
  else if (((variant).type == NPVariantType_Double))
            ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~
/Users/marf/mozilla-central-git/dom/plugins/test/testplugin/nptest_utils.cpp:54:28: note: remove extraneous parentheses around the
      comparison to silence this warning
  else if (((variant).type == NPVariantType_Double))
           ~               ^                      ~
/Users/marf/mozilla-central-git/dom/plugins/test/testplugin/nptest_utils.cpp:54:28: note: use '=' to turn this equality comparison
      into an assignment
  else if (((variant).type == NPVariantType_Double))
                           ^~
                           =
5 warnings generated.

Since we build with -Werror in automation, these warnings become errors and the build fails. I'd guess the best thing to try is to disable Werror in your try push and see if that helps. Though due to the difference in behavior with and without -save-temps, I don't have the best confidence in the output.
You could also try to use our built-in .i rules instead of -save-temps with a patch:

diff --git a/config/rules.mk b/config/rules.mk
index 2551b07..c32580e 100644
--- a/config/rules.mk
+++ b/config/rules.mk
@@ -1088,7 +1088,6 @@ $(filter %.s,$(CSRCS:%.c=%.s)): %.s: %.c $(call mkdir_deps,$(MDDEPDIR))
        $(REPORT_BUILD_VERBOSE)
        $(CC) -S $(COMPILE_CFLAGS) $($(notdir $<)_FLAGS) $(_VPATH_SRCS)

-ifneq (,$(filter %.i,$(MAKECMDGOALS)))
 # Call as $(call _group_srcs,extension,$(SRCS)) - this will create a list
 # of the full sources, as well as the $(notdir) version. So:
 #   foo.cpp sub/bar.cpp
@@ -1115,6 +1114,7 @@ $$(_PREPROCESSED_$1_FILES): %.i: %.$1
        $$(addprefix $$(MKDIR) -p ,$$(filter-out .,$$(@D)))
        $$($3) -C $$(PREPROCESS_OPTION)$$@ $(foreach var,$4,$$($(var))) $$($$(notdir $$<)_FLAGS) $$(_VPATH_SRCS)

+target:: $$(_PREPROCESSED_$1_FILES)
 endef

 $(eval $(call PREPROCESS_RULES,cpp,CPPSRCS,CCC,COMPILE_CXXFLAGS))
@@ -1173,8 +1173,6 @@ endif

 endif

-endif
-
 $(RESFILE): %.res: %.rc
        $(REPORT_BUILD)
        @echo Creating Resource file: $@

Then you should get all the .i files and can skip -save-temps (which seems to generate a lot of extra data - some of the .s files are huge). You would still need to turn off -Werror, though.
Disabling -Werror and applying :mshal's patch (yay, thanks), it just improved a lot, but I now reached another level of weird issues [1], specially on BB.

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=c53a72985b41&selectedJob=103327939
(In reply to Wander Lairson Costa [:wcosta] from comment #80)
> Disabling -Werror and applying :mshal's patch (yay, thanks), it just
> improved a lot, but I now reached another level of weird issues [1],
> specially on BB.
> 
> [1]
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=c53a72985b41&selectedJob=103327939

The TC one looks like a clang plugin issue; you may want to disable the static analysis on mac jobs for the purposes of this comparison:

http://dxr.mozilla.org/mozilla-central/source/build/macosx/cross-mozconfig.common#48
http://dxr.mozilla.org/mozilla-central/source/browser/config/mozconfigs/macosx64/opt-static-analysis#11
http://dxr.mozilla.org/mozilla-central/source/browser/config/mozconfigs/macosx64/debug-static-analysis#10

The BB one is a weird error; maybe we're using our own ranlib/ar for C++ compilation, but we're not passing that through to Rust builds, and we should be?
Could just disable geckodriver for the purposes of this comparison, since we're not running any tests with these builds.
Latest try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=29e31e6f30ed&selectedJob=103483632

At least we now have the same error for both BB and TC. I suspect the problem arises from file unification.
(In reply to Michael Shal [:mshal] from comment #78)
> Can't say I understand why, but building with '-save-temps' seems to flag
> additional warnings that aren't present without the flag. I can reproduce an
> example locally:

The temp files have been processed by the C preprocessor so C macros have been expanded and clang is reporting warnings in code that it would have ignored in the original source files.

If these new warnings only affect Unified_cpp_test_testplugin0.cpp, we can just add `if CONFIG['CLANG_CXX']: CXXFLAGS += ['-Wno-error=parentheses-equality']` to the bottom of dom/plugins/test/testplugin/testplugin.mozbuild.
(In reply to Wander Lairson Costa [:wcosta] from comment #83)
> Latest try push:
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=29e31e6f30ed&selectedJob=103483632
> 
> At least we now have the same error for both BB and TC. I suspect the
> problem arises from file unification.

Building without unification is going to pose its own set of problems, unfortunately. If you try the same push with sccache disabled does it change anything?
(In reply to Chris Peterson [:cpeterson] from comment #84)
> (In reply to Michael Shal [:mshal] from comment #78)
> > Can't say I understand why, but building with '-save-temps' seems to flag
> > additional warnings that aren't present without the flag. I can reproduce an
> > example locally:
> 
> The temp files have been processed by the C preprocessor so C macros have
> been expanded and clang is reporting warnings in code that it would have
> ignored in the original source files.

Hmm, I don't follow - all I'd expect -save-temps to do is to save the output of the C preprocessor and compiler that it has to run anyway instead of discarding them as intermediate files. Why would saving the output of the preprocessor change the behavior? The compiler should see the same preprocessed code regardless of whether it is also copied to a .i file.

> If these new warnings only affect Unified_cpp_test_testplugin0.cpp, we can
> just add `if CONFIG['CLANG_CXX']: CXXFLAGS +=
> ['-Wno-error=parentheses-equality']` to the bottom of
> dom/plugins/test/testplugin/testplugin.mozbuild.

Good idea, but unfortunately this isn't the only file. Using a separate rule for the .i files seems to work around the issue, though I'm not certain yet if that's generating .i's for each corresponding .o in all cases.
(In reply to Michael Shal [:mshal] from comment #86)
> (In reply to Chris Peterson [:cpeterson] from comment #84)
> > The temp files have been processed by the C preprocessor so C macros have
> > been expanded and clang is reporting warnings in code that it would have
> > ignored in the original source files.
> 
> Hmm, I don't follow - all I'd expect -save-temps to do is to save the output
> of the C preprocessor and compiler that it has to run anyway instead of
> discarding them as intermediate files. Why would saving the output of the
> preprocessor change the behavior? The compiler should see the same
> preprocessed code regardless of whether it is also copied to a .i file.

Sadly it doesn't work that way in clang:
http://lists.llvm.org/pipermail/cfe-dev/2016-November/051421.html

"-svae-temps will run the parser on the preprocessed output,
so all macro-expansion knowledge is missing. That changes the impact of
a number of warnings like redundant braces."

We've hit an almost identical issue with -Werror with clang+sccache. sccache runs the preprocessor to hash the output as part of the cache key, and then attempts to compile the already-preprocessed output. Unfortunately that fails sometimes with -Werror for the same reason.
:mshal got a build with intermediate files [1]
After path and line numbers cleanup, there was no difference at all:

diff -u -r tc-objdir/browser/app/nsBrowserApp.i bb-objdir/browser/app/nsBrowserApp.i
--- tc-objdir/browser/app/nsBrowserApp.i	2017-06-02 12:56:28.000000000 -0300
+++ bb-objdir/browser/app/nsBrowserApp.i	2017-06-02 12:56:06.000000000 -0300
@@ -73505,13 +73505,13 @@
                  "Firefox",
                  "firefox",
                  "55.0a1",
-                 "20170602030417",
+                 "20170601200700",
                  "{ec8030f7-c20a-464f-9b0e-13a3a9e97384}",
                  __null, // copyright
                  (1 << 3) | (1 << 1),
                  "55.0a1",
                  "55.0a1",
-                 "https://crash-reports.mozilla.com/submit?id={ec8030f7-c20a-464f-9b0e-13a3a9e97384}&version=55.0a1&buildid=20170602030417",
+                 "https://crash-reports.mozilla.com/submit?id={ec8030f7-c20a-464f-9b0e-13a3a9e97384}&version=55.0a1&buildid=20170601200700",
                  __null
              };
 #  "/src/browser/app/nsBrowserApp.cpp" 2
Only in tc-objdir/memory/replace: dummy
Binary files tc-objdir/objdir.tar.bz2 and bb-objdir/objdir.tar.bz2 differ

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=5a550aef3e6b07f8c47420cdad8999455590d7d3&selectedJob=103938641
See Also: → 1369866
Hi there,

I just ran performance comparison against latest m-c code (with my patches that make builds identical) [1]. Cross builds still behind native builds.

[1] https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=389d3866a6c26b7613ee0d04eb710209cab5e5f9&newProject=try&newRevision=1636956c80ce26b5757a2b8260f7f5523c1aa550&framework=1&showOnlyImportant=0
If there's no diff between the *.i files -- is there a diff between the *.s files?
(That would likely be the case only if the compilers being used weren't actually the same -- I don't know what the chance of that is -- although it certainly wouldn't surprise me if Apple patches clang in various ways...)
(In reply to David Baron :dbaron: ⌚️UTC-7 from comment #90)
> If there's no diff between the *.i files -- is there a diff between the *.s
> files?

We have diffed the disassembly of XUL.dylib, and there are no significant changes.  There are differences in offsets (e.g. jumps) in some places, but those seem pretty trivial.  The cross-compiled compiler seems to annotate jump table offsets in the .text section (so they disassemble as data, rather than code), but examining the actual bytes confirms that there are no differences in the stored offsets in the tables.

The native (buildbot) builds are codesigned, and the cross (taskcluster) builds are not; this does not appear to produce any performance differences.

The linker invocations are virtually identical besides the obvious differences in framework names and such due to the filesystem.  The native build passes in libclang_rt.osx.a on the link line due to differences in how clang sets up linking for native Darwin vs. cross (who'd be crazy enough to cross-compile to *x86-64* Darwin, eh?), but fixing that difference does not appear to change performance.

mshal observed that testing talos performance locally on his machine produces *no* appreciable difference between the builds.  This obviously does not match up with what we're seeing on infra, but we haven't figured out why that would be the case.
(In reply to David Baron :dbaron: ⌚️UTC-7 from comment #91)
> (That would likely be the case only if the compilers being used weren't
> actually the same -- I don't know what the chance of that is -- although it
> certainly wouldn't surprise me if Apple patches clang in various ways...)

Forgot to mention: we're not using Apple's clang; we're using home-built clang, and I *believe* that our native and cross clangs are both built from the same SVN revision.  So the compilers should be identical...and extra kudos to the clang developers for trying to ensure that identical targets produce the same code regardless of host, that's a non-trivial thing to do!
(In reply to Nathan Froyd [:froydnj] from comment #92)
> (In reply to David Baron :dbaron: ⌚️UTC-7 from comment #90)
> > If there's no diff between the *.i files -- is there a diff between the *.s
> > files?
> 
> We have diffed the disassembly of XUL.dylib, and there are no significant
> changes.  There are differences in offsets (e.g. jumps) in some places, but
> those seem pretty trivial.

There were code differences between mov and ldr, are those gone now? Has libmozglue.dylib been compared too?
(In reply to Mike Hommey [:glandium] from comment #94)
> There were code differences between mov and ldr, are those gone now?

s/ldr/lea/
Can we ship cross-compiled builds to the Nightly channel for a day or two and see if Telemetry collaborates mshal or Talos? In doing so, we may also get data that points us in the direction of the root cause, such as specific configurations that do or don't exhibit the slowdown.
(In reply to Gregory Szorc [:gps] from comment #96)
> Can we ship cross-compiled builds to the Nightly channel for a day or two
> and see if Telemetry collaborates mshal or Talos? In doing so, we may also
> get data that points us in the direction of the root cause, such as specific
> configurations that do or don't exhibit the slowdown.

That question is outside of my pay grade; mshal might know more here, but I will also ni? him to ensure that I didn't completely misrepresent the results he saw.
Flags: needinfo?(mshal)
(In reply to Mike Hommey [:glandium] from comment #95)
> (In reply to Mike Hommey [:glandium] from comment #94)
> > There were code differences between mov and ldr, are those gone now?
> 
> s/ldr/lea/

Those differences were in libmozglue; the same phenomenon did not occur in XUL (assuming the assembly diffing was done correctly).  Perhaps we should be verifying linker command lines etc. for libmozglue as well, since that is compiled very differently?
(In reply to Nathan Froyd [:froydnj] from comment #98)
> (In reply to Mike Hommey [:glandium] from comment #95)
> > (In reply to Mike Hommey [:glandium] from comment #94)
> > > There were code differences between mov and ldr, are those gone now?
> > 
> > s/ldr/lea/
> 
> Those differences were in libmozglue; the same phenomenon did not occur in
> XUL (assuming the assembly diffing was done correctly).  Perhaps we should
> be verifying linker command lines etc. for libmozglue as well, since that is
> compiled very differently?

Has it been checked again after bug 1369866?
(In reply to Mike Hommey [:glandium] from comment #99)
> (In reply to Nathan Froyd [:froydnj] from comment #98)
> > (In reply to Mike Hommey [:glandium] from comment #95)
> > > (In reply to Mike Hommey [:glandium] from comment #94)
> > > > There were code differences between mov and ldr, are those gone now?
> > > 
> > > s/ldr/lea/
> > 
> > Those differences were in libmozglue; the same phenomenon did not occur in
> > XUL (assuming the assembly diffing was done correctly).  Perhaps we should
> > be verifying linker command lines etc. for libmozglue as well, since that is
> > compiled very differently?
> 
> Has it been checked again after bug 1369866?

No idea.  mshal and ted have been doing builds with various bits of logging enabled, and mshal is already needinfo'd, so ni'ing ted too!
Flags: needinfo?(ted)
I did a try push atop mshal+wcosta's patch stack (that's been steadily removing differences they've found) that added -v to clang invocations:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=f17aec55c0fc41e9e42d7a3f69a3583286804a3f&selectedJob=104981194
Flags: needinfo?(ted)
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #101)
> I did a try push atop mshal+wcosta's patch stack (that's been steadily
> removing differences they've found) that added -v to clang invocations:
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=f17aec55c0fc41e9e42d7a3f69a3583286804a3f&selectedJob=1
> 04981194

Looks like the only significant difference in linking mozglue is the libclang_rt.osx.a issue discussed earlier.
Comment 102 is just command-line options, though...are we *super* sure that both builds are picking up the same libc++?  I think we had diffed the symbols from nm and discovered basically no differences there, but...
(In reply to Nathan Froyd [:froydnj] from comment #97)
> That question is outside of my pay grade; mshal might know more here, but I
> will also ni? him to ensure that I didn't completely misrepresent the
> results he saw.

That's right - when I run the talos tests on my mac, the results are consistent. Here's just a11yr for simplicity:

--- a11yr-mshal-cross.log ---
  a11yr: 477.958766492
--- a11yr-mshal-native.log ---
  a11yr: 471.046078259
--- a11yr-try-cross.log ---
  a11yr: 481.338055724
--- a11yr-try-native.log ---
  a11yr: 476.459945964

mshal-cross/native are my local builds, while try-cross/native are from my try push. Notice how they're all around 470-480 - no significant slowdown for the cross build. (These numbers are all slower than what we'd expect in a try push, but I assume that's just because my local machine is not the same hardware).

I'm also running the same tests on a t-yosemite-r7 machine. Here are the same exact builds:

--- a11yr-try-cross.log ---
  a11yr: 573.53111317
--- a11yr-try-native.log ---
  a11yr: 422.414690345
--- a11yr-mshal-cross.log ---
  a11yr: 420.767708579
--- a11yr-mshal-native.log ---
  a11yr: 433.032256866

Here, the "good" runs are ~420, while the try cross build is clearly worse at 570. These numbers are fairly consistent with what the actual try push shows - lower 400's for a native OSX build and upper 400's into 500's for a cross build.

One curious thing here is that a cross build built from my local machine falls into the "good" category. I've been trying to find out what's different between my local build and the try build, as well as the try cross & try native pushes. All I've discovered so far is that replacing XUL in the try-cross build with the XUL either from the native try push or from my local cross build is sufficient for improving the performance results.

If anyone else with a mac can run these locally to confirm, that would be helpful. Here's a try push with a native & cross build: https://treeherder.mozilla.org/#/jobs?repo=try&revision=bdd3f1125b0c7c0edc81f5d8a42a81ae7c8436fd

Alternatively, if someone has the cross-build tools handy and wants to generate a .dmg from their local machine, I can try to run it manually on the loaner and see how it compares.
Flags: needinfo?(mshal)
waw, clearly, comparing the binaries mshal-cross with try-cross would be a step forward. Could you put your mshal-cross build somewhere we can download it? (I assume the try you linked to has a build that corresponds to your try-cross results)
(In reply to Nathan Froyd [:froydnj] from comment #103)
> Comment 102 is just command-line options, though...are we *super* sure that
> both builds are picking up the same libc++?  I think we had diffed the
> symbols from nm and discovered basically no differences there, but...

Oh, duh, this is assured by zero differences between .i files.  So that's good.
(In reply to Nathan Froyd [:froydnj] from comment #106)
> (In reply to Nathan Froyd [:froydnj] from comment #103)
> > Comment 102 is just command-line options, though...are we *super* sure that
> > both builds are picking up the same libc++?  I think we had diffed the
> > symbols from nm and discovered basically no differences there, but...
> 
> Oh, duh, this is assured by zero differences between .i files.  So that's
> good.

Well, almost.  We'd want some assurances that we're picking up the same binary files from the linker, or at least knowing what files the linker is picking so we can inspect those.  The man page for OS X ld, http://www.manpages.info/macosx/ld.1.html , says that the environment variables RC_TRACE_{ARCHIVES,DYLIBS} should be set to provide that logging output.  mshal, do you want to mix setting those in with all the other logging you have?
Flags: needinfo?(mshal)
(In reply to Nathan Froyd [:froydnj] from comment #107)
> Well, almost.  We'd want some assurances that we're picking up the same
> binary files from the linker, or at least knowing what files the linker is
> picking so we can inspect those.  The man page for OS X ld,
> http://www.manpages.info/macosx/ld.1.html , says that the environment
> variables RC_TRACE_{ARCHIVES,DYLIBS} should be set to provide that logging
> output.  mshal, do you want to mix setting those in with all the other
> logging you have?

Sure, here it is: https://treeherder.mozilla.org/#/jobs?repo=try&revision=1dc2a3670ee0e8ff791cb0a992da24a731190b40
Flags: needinfo?(mshal)
Here are the .dmgs for the times mentioned in #c104: https://people-mozilla.org/~mshal/cross/

I'm not sure how relevant a direct comparison will be between my local cross build and the try build, however. For one thing, I realized my toolchain was slightly out of date wrt tooltool. I've updated that locally now with no speed difference (I was on the previous tooltool build of clang 3.9.0)

Another difference is I'm not using the nightly mozconfig, but just the equivalent of build/macosx/cross-mozconfig.common. If I switch my local mozconfig to pull in browser/config/mozconfigs/macosx64/nightly, I see the similar bad performance that the try builds get. I tried a push where I commented out the majority of the nightly and common-opt mozconfigs so that it is essentially what I build locally: https://treeherder.mozilla.org/#/jobs?repo=try&revision=80d680b94e2182c27ce33dcab1f1a14b8961b2f3

The talos tests don't actually run correctly through mozharness, but I downloaded the build onto the loaner so I could run it through mach, and the speed is improved. It's possible one of these mozconfig options will offer a clue, though I'm suspecting that building without say '--enable-profiling' speeds up the result by roughly the same amount that the cross build degrades it, so they cancel out. Perhaps this is why my local cross build looked to be the same as native when I was building without the nightly mozconfig.

I'm out tomorrow, but I'll try to dig in more next week to see what specifically was causing the change.
None of #c109 changes that I still don't see performance differences when I run the tests on my local mac, so I still think it is worthwhile for someone else with an OSX machine to try the comparison, and/or see about running a test on Nightly users as per #c96.
I've narrowed down the mozconfig flag from #c109 that was making my local cross builds appear fast to --disable-install-strip. Our macosx64 nightly mozconfig contains the flag (so it is used by both the native & cross builds on try), while my local cross build did not. On a try push where this flag is removed, the cross-build talos performance is now in-line with native:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=f18d021a32a5&newProject=try&newRevision=7ec5a7fda7079bdbb4927c8e9af76a03abbfc82f&framework=1&showOnlyImportant=0

Curiously, removing this flag doesn't change the performance of the native build.

Anyone have any ideas why removing the flag would only affect cross builds? And does anyone know why we needed this flag on OSX in the first place? (blame will tell you that I added the flag, but that was when I was bringing the macosx/* mozconfigs in-line with the old macosx-universal/* mozconfigs. The --disable-install-strip flag was present in the first revision of macosx-universal/nightly from bug 558180.)
We use --disable-install-strip on Nightly so users can profile Nightly builds.
I want to formally summarize some points I just made in #build.

Currently, Talos on Nightly is measuring unstripped binaries. Talos on other release channels (Beta/Release/etc) is measuring stripped binaries. This means we no continuous CI coverage for Talos on stripped binaries against the development tip of Firefox. Instead, we get 1 Talos signal for stripped binary performance every 6 weeks - after uplift.

I'm willing to wager that people looking at Talos changes to Beta, etc after uplift are making the assumption that every Talos change was previously triaged on central/Nightly and that Talos changes on non-Nightly channels are mostly rubber stamped. If we're lucky, a significant regression will cause someone to dig deeper. But there's no guarantee of this.

Since we're not actively testing stripped binaries on central, it is entirely possible we will regress performance of stripped binaries and we will have no clue which commit it was (any from that 6 week cycle are candidates) and we may even fail to spot the regression at all!

I think this is problematic and I would lobby for running Talos against stripped binaries for central/inbound/autoland. But that's a discussion for another bug.

Now, the good news in all of this is I think it may be acceptable to enable OS X cross builds today. We're still not sure why performance of unstripped cross builds is slower than native unstripped builds. But, *this is not a configuration we ship outside of the Nightly channel*. So, the regression only occurs for Nightly users and is not something that will ride the trains. Furthermore, I would argue that the performance regressions don't invalidate the value of the Talos measurements: they just shift the values. If someone makes an operation faster or slower, it will (likely) impact the Talos data irregardless of whether the binary is stripped or built natively vs cross.

Because this regression is blocking such important work, I think it is acceptable to take the Talos regression (knowing it won't impact release channels nor significantly invalidate ongoing Talos measurements on Nightly) and leave the cross build unstripped binary performance issue as a P1 bug for the build team to resolve with high urgency.
this is great information, we do look at the beta vs aurora vs trunk on just about every merge day when results come in.  We try to match up all regressions on beta with where they were originally introduced on trunk.

Looking at this bug, there is a 34% sessionrestore regression, but mozilla-beta shows parity if not an improvement in sessionrestore over trunk for the last 90 days:
https://treeherder.mozilla.org/perf.html#/graphs?timerange=7776000&series=%5Bmozilla-inbound,e9484a55bdf595b7b86aea582bb9016cdd1ba956,1,1%5D&series=%5Bmozilla-beta,e9484a55bdf595b7b86aea582bb9016cdd1ba956,1,1%5D

I would expect this to be much worse if this stripped binaries would prove out to be the cause for the large measured regression in performance.  Ideally we would be shipping the faster builds to users, not the slower ones.

tart which is not measuring startup, has a 27% regression for cross compile builds, but is effectively identical when compared to trunk:
https://treeherder.mozilla.org/perf.html#/graphs?timerange=7776000&series=%5Bmozilla-central,c849608509c2d340e4065b7cf9d7d88b8cd0fc8d,1,1%5D&series=%5Bmozilla-beta,c849608509c2d340e4065b7cf9d7d88b8cd0fc8d,1,1%5D
I didn't see comment 111, so the stripped builds do not have any effect for native builds, but do for cross compiled builds, looking at :mshal's try link, the numbers seem to confirm that we will see little to no change for what we ship, but will see  a large regression on mozilla-beta.

The only thing this will make hard is when we merge trunk->beta that matching up the regressions that were not fixed on trunk will be hard to determine which bugs they are related to on mozilla-beta.  Probably ok to just ignore osx regressions if the answer isn't obvious.
(In reply to Joel Maher ( :jmaher) from comment #115)
> I didn't see comment 111, so the stripped builds do not have any effect for
> native builds, but do for cross compiled builds, looking at :mshal's try
> link, the numbers seem to confirm that we will see little to no change for
> what we ship, but will see  a large regression on mozilla-beta.

Sorry, I'm not quite following that--why will we see a regression on mozilla-beta?
I assume beta will be stripped, therefore it will have much higher performance than nightly according to talos.  so if there is a regression that shows up on nightly, when that merges to beta, we validate it and link it to the appropriate bug for tracking purposes, in this case it will not be as simple as seeing where the graphs started out the same and looking for changes.
Depends on: 1373014
(In reply to Joel Maher ( :jmaher) from comment #117)
> I assume beta will be stripped, therefore it will have much higher
> performance than nightly according to talos.  so if there is a regression
> that shows up on nightly, when that merges to beta, we validate it and link
> it to the appropriate bug for tracking purposes, in this case it will not be
> as simple as seeing where the graphs started out the same and looking for
> changes.

Oh, gotcha! Yes, this will make detecting regressions between nightly and beta more difficult. I'm hopeful that we'll be able to figure out the root cause here at some point.
No longer depends on: 1373014
Performance regressions in cross builds of Mac Nightly do not need to block enabling Stylo on Nightly.
Blocks: stylo-release
No longer blocks: stylo-nightly-build
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #118)
> (In reply to Joel Maher ( :jmaher) from comment #117)
> > I assume beta will be stripped, therefore it will have much higher
> > performance than nightly according to talos.  so if there is a regression
> > that shows up on nightly, when that merges to beta, we validate it and link
> > it to the appropriate bug for tracking purposes, in this case it will not be
> > as simple as seeing where the graphs started out the same and looking for
> > changes.
> 
> Oh, gotcha! Yes, this will make detecting regressions between nightly and
> beta more difficult. I'm hopeful that we'll be able to figure out the root
> cause here at some point.

As we discussed in the Quantum Dev mtg last week, we have two options here to move forward:

1) Setup an new, stripped nightly build that runs in parallel to the current unstripped nightly; or,
2) Change the current nightly to be stripped (like beta), and publish a script that downloads the symbols when required (or better still, integrate that functionality into mach)

Do others have a preference here? 

I lean towards #2 because it brings the nightly config closer to beta and avoids introducing a new build variant.

None of this precludes mshal from continuing to investigate the discrepancy. He plans to spend the rest of the month on that, barring a breakthrough.
I like the idea of having more convergence between builds (so #2). However, I'm worried about negative developer impacts of stripping Nightly. My understanding is things like the Gecko Profiler will break as well. If we do this, I'd like to know what all will need retrofitting to handle symbols. And that will likely require a post to dev-platform to see what unknown tools leveraging symbols are in use.

I think it will be a non-trivial amount of work to add symbol fetching into every tool that needs it. Is that work worth doing? Probably. But depending on the scope bloat, it may not be realistic as a quick and easy stop-gap.
IPC has been exonerated. This is the realm of the build system.
Component: IPC → Build Config
(In reply to Chris Cooper [:coop] from comment #120)
> 2) Change the current nightly to be stripped (like beta), and publish a
> script that downloads the symbols when required (or better still, integrate
> that functionality into mach)

I think I mentioned offhand in a meeting, but I wrote such a script years ago:
https://hg.mozilla.org/users/jwatt_jwatt.org/fetch-symbols/file/tip/fetch-symbols.py

We can take that code and use it as needed. The Gecko profiler should actually already be OK, since it can symbolicate profiles using symbolapi.mozilla.org. The bigger issue is other local native profilers, which would need local symbols to do anything useful. The fetch-symbols script should suffice for that use case, but it does mean people will have to wind up downloading a few hundred MB of symbol files to do profiling.
(In reply to Gregory Szorc [:gps] from comment #121)
> I like the idea of having more convergence between builds (so #2). However,
> I'm worried about negative developer impacts of stripping Nightly. My
> understanding is things like the Gecko Profiler will break as well. If we do
> this, I'd like to know what all will need retrofitting to handle symbols.
> And that will likely require a post to dev-platform to see what unknown
> tools leveraging symbols are in use.

I posted to dev.platform:

https://groups.google.com/d/msg/mozilla.dev.platform/7pPlfH0a7Ro/evWkUDjHAQAJ

Given Ted's comments, I'm less worried about fetching symbols.
As I understand it the situations is:
- Native build (fast)
- Native build - stripped (fast)
- Cross build (slow)
- Cross build - stripped (fast)

For those of us just catching up here can someone post the performance results and links to the builds for these 4 configurations as well as the clang svn revision and compiler binaries?

Also, from comment 92 it seems like there are some differences in the generated code for cross builds. Do we have any idea where these differences are coming from?
sewardj asked for CPU specs for the machine.

http://www.everymac.com/systems/apple/mac_mini/specs/mac-mini-core-i7-3.0-late-2014-specs.html

      Model Identifier: Macmini7,1
      Processor Name: Intel Core i7
      Processor Speed: 3 GHz
      Number of Processors: 1
      Total Number of Cores: 2
      L2 Cache (per Core): 256 KB
      L3 Cache: 4 MB
      Memory: 16 GB
      APPLE SSD SM0256G
(In reply to Jeff Muizelaar [:jrmuizel] from comment #125)
> As I understand it the situations is:
> - Native build (fast)
> - Native build - stripped (fast)
> - Cross build (slow)
> - Cross build - stripped (fast)
> 
> For those of us just catching up here can someone post the performance
> results and links to the builds for these 4 configurations as well as the
> clang svn revision and compiler binaries?

Native vs cross: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f18d021a32a5
Native vs cross, stripped: https://treeherder.mozilla.org/#/jobs?repo=try&revision=7ec5a7fda7079bdbb4927c8e9af76a03abbfc82f

Perfherder comparison of the above: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=f18d021a32a5&newProject=try&newRevision=7ec5a7fda7079bdbb4927c8e9af76a03abbfc82f&framework=1&showOnlyImportant=0

Compiler info, native: https://hg.mozilla.org/try/file/f18d021a32a526f02f4188197fd848f9819a51b7/browser/config/tooltool-manifests/macosx64/releng.manifest
Compiler info, cross: https://hg.mozilla.org/try/file/f18d021a32a526f02f4188197fd848f9819a51b7/browser/config/tooltool-manifests/macosx64/cross-releng.manifest

> Also, from comment 92 it seems like there are some differences in the
> generated code for cross builds. Do we have any idea where these differences
> are coming from?

The tagging of the jump tables comes about because we explicitly strip that information in the linker on native builds:

http://dxr.mozilla.org/mozilla-central/source/build/macosx/local-mozconfig.common#34

The jump offsets come about because of small (immaterial) differences in the Mach-O object headers, I believe.
(In reply to Nathan Froyd [:froydnj] from comment #127)

Thanks.

It seems like we could turn this on in both builds to further minimize the difference.

Are we using the same version of the linker in both places?

One guess I have as to the performance difference is differences in section offsets between the builds. There different between the native and cross builds, and they change during the stripping process.
(In reply to Jeff Muizelaar [:jrmuizel] from comment #128)
> It seems like we could turn this on in both builds to further minimize the
> difference.

I've been building with -Wl,-no_data_in_code_info in the cross builds recently, though it doesn't change the performance. I believe wcosta is working on that (and other patches) in order to make sure everything is identical going forward.

With this flag in place I compared *.o with otool -t (for the text segment) and otool -d (for the data segment). Almost all files matched, except the following:

text section differences (otool -t):
 - ./db/sqlite3/src/sqlite3.o
 - ./gfx/skia/GrGLGpu.o
 - ./gfx/skia/SkMatrix.o
 - ./gfx/skia/Unified_cpp_gfx_skia21.o
 - ./layout/forms/Unified_cpp_layout_forms0.o
 - ./media/libjpeg/jchuff.o
 - ./media/libvpx/vp9_firstpass.o
 - ./security/nss/lib/certdb/certdb_certdb/crl.o - if AcquireDPCache is commented out, diff is exact (even if just the return statement is commented out!)

data section differences (otool -d)
 - ./layout/forms/Unified_cpp_layout_forms0.o
 - ./security/nss/lib/certdb/certdb_certdb/crl.o

I couldn't exactly figure out why these files were generated differently - the .i files are identical in all cases. For sqlite3.c I found I could make them produce the same output with just this diff:

$ cat sqlite3-diff.patch 
diff --git a/db/sqlite3/src/sqlite3.c b/db/sqlite3/src/sqlite3.c
index 657a54e..443fdfb 100644
--- a/db/sqlite3/src/sqlite3.c
+++ b/db/sqlite3/src/sqlite3.c
@@ -123826,6 +123826,7 @@ SQLITE_PRIVATE void sqlite3ColumnDefault(Vdbe *v, Table *pTab, int i, int iReg){
 **          \_______/ \________/     \______/       \________________/
 *            onError   pTabList      pChanges             pWhere
 */
+extern int foo;
 SQLITE_PRIVATE void sqlite3Update(
   Parse *pParse,         /* The parser context */
   SrcList *pTabList,     /* The table in which we should change things */
@@ -124435,7 +124436,7 @@ SQLITE_PRIVATE void sqlite3Update(
   }else if( eOnePass==ONEPASS_MULTI ){
     sqlite3VdbeResolveLabel(v, labelContinue);
     sqlite3WhereEnd(pWInfo);
-  }else if( pPk ){
+  }else if( foo ){
     sqlite3VdbeResolveLabel(v, labelContinue);
     sqlite3VdbeAddOp2(v, OP_Next, iEph, addrTop); VdbeCoverage(v);
   }else{

But I don't know if that tells us anything. In any case, swapping out these cross .o's for natively-compiled versions resulted in no performance difference. So I suspect even if we figured out why they were generated differently, it wouldn't help here.

> Are we using the same version of the linker in both places?

cctools/bin/{x86_64-apple-darwin11-}ld reports version 274.2 in both the cross and native toolchains.

As a test I tried copying the objdir from a cross build onto a mac and linking XUL natively, then injected it back into the cross build. Unfortunately that had no affect on the performance.

> One guess I have as to the performance difference is differences in section
> offsets between the builds. There different between the native and cross
> builds, and they change during the stripping process.

Would this be affected by the CWD of the build? I've been using a mac loaner to more directly compare the output of the compiler on a native build and a cross build (see .o files above). In order to make things easier to diff I built the native build using Taskcluster-style paths (/home/worker/workspace/build/src instead of /builds/slave/try-m64-0000000000000000000000/build/src). Curiously when I ran talos tests on this native build, the performance was bad. When I checked the same tree and toolchain out into the /builds/slave/... style path, the performance was good. This is the same machine, same revision, same tooltool manifest, just a different CWD.

So I tried to hack taskcluster/scripts/builder/build-linux.sh to use /builds/slave/... instead of /home/worker/workspace, and the cross-performance is now good (without stripping).

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=1bbe1db6f9349f24ae7d188a40c6a577f7cf39a0&newProject=try&newRevision=464c24f7ec9eaeb4a01bce6b0b2c5bdaf5e43ba3&framework=1&showOnlyImportant=0

This is the only difference between the two:

https://hg.mozilla.org/try/rev/a70be9c6dc1401061adc4b30858120c8cde1c256
So... comment 129 is confirming the last paragraph of comment 128...
The interesting part is that the path is longer on buildbot vs. taskcluster. It seems it's sheer luck that it's faster with buildbot.
So it sounds like some performance sensitive code is being negatively impacted by page or cache line alignment and that the existing build path length just so happens to result in alignment with optimal performance? Fun. If so, yeah, we'll want to change how we invoke the linker to ensure optimal alignment going forward.

mshal: one caveat to moving things to /builds/slave is that in TC that path is an AUFS volume whereas /home/worker/workspace is an EXT4 volume (at least outside of Try). This is relevant because AUFS has horrible performance for concurrent I/O. See bug 1291940. It doesn't need to block the transition, but at some point we'll want to be sure we're building from a TC cache volume using EXT4. Perhaps if you pad the path in /home/worker/workspace so it has the same length as /builds/slave/... it will just work?
Yeah, I say we just make a longer path name under the existing `/home/worker/workspace` directory to get paths that are equally as long as the buildbot paths, and then file a followup to figure out how we can make this less sucky, maybe with explicit linker instructions or something.
It seems it's not the length of the path, but whether or not /home is in the path. I tried just bumping the path length in a TC build, but the performance is still slow:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=acb807794ae8a4e53988abf52194c9fead4a0c84
-: WORKSPACE                     ${WORKSPACE:=/home/worker/workspace}
+: WORKSPACE                     ${WORKSPACE:=/home/worker/workspace0000000000000000000000}

After fiddling around with various paths, it seems that if we have /home in the path, it's slow, otherwise it's fast.

On the OSX machines we have /home mounted as auto_home (something about autofs?):

$ mount
map auto_home on /home (autofs, automounted, nobrowse)
...

With this mounted, I get bad performance on the cross build and good performance on the buildbot build. If I disable it in /etc/auto_master and unmount it, I get good performance in both:

--- with-autohome-try-cross-f18d021a32a526f02f4188197fd848f9819a51b7.log ---
  a11yr: 576.89531924
--- with-autohome-try-native-f18d021a32a526f02f4188197fd848f9819a51b7.log ---
  a11yr: 420.733328064

--- no-autohome-try-cross-f18d021a32a526f02f4188197fd848f9819a51b7.log ---
  a11yr: 421.343462125
--- no-autohome-try-native-f18d021a32a526f02f4188197fd848f9819a51b7.log ---
  a11yr: 418.669810685

(The with-autohome-* logs match the configuration we have in automation, while no-autohome* is with /home unmounted)

You can see a slight performance difference in accessing /home when it's mounted that way (here's 1M stats in a loop):

/Users: 0m0.880s
/home no auto_home: 0m0.867s
/home with auto_home: 0m1.004s

I guess with the debug information in, it is trying to check all the source files or something? I tried to check this with dtruss but couldn't find any evidence that that is what was happening.

In any case, I think we should try to remove this line from /etc/auto_master in the OSX tester image and see if that fixes it:

/home			auto_home	-nobrowse,hidefromfinder

(I'm assuming we don't actually need autofs for anything on the test machines)

I don't have /home mounted this way on my local mac, which I guess is why I don't see any performance difference running these talos tests locally.
See Also: → 1375414
Ah, that makes a lot more sense! I'm not sure exactly what it's looking for, but it's trying to stat *something* from the original build location and failing, clearly. FWIW, my Mac has /home mounted (I don't remember ever fiddling with that, so it must be the default?), so it would probably behoove us to fix this, because it might actually show up on users' machines.

On TC Windows builds we build from a symlink dir because the default task directory has the Task ID encoded in it, and we wanted consistent paths so we could get sccache cache hits. Maybe we can just make a /builds -> /home/worker/workspace symlink, and build with $PWD in /builds?
Here's what we do for Windows generic worker builds:
https://dxr.mozilla.org/mozilla-central/rev/f8bb96fb5c4f9960c5f754f877eceb677df18ddc/taskcluster/taskgraph/transforms/job/mozharness.py#238

We symlink z:\build to the task dir, then `cd /d z:\build` and build from there.
I tried the symlink approach, but it seems somewhere along the way we're normalizing the path back to /home/worker, so the same paths end up in the final executables.

I think we should just be able to change /home/worker to /builds/worker (or something) everywhere in taskcluster and get the performance boost, though I'm having trouble sorting out what to change in order to get the base image built with the new paths. Just changing /home/worker to /builds/worker in the whole tree did not work :). wcosta, would you be able to take a look at what would be involved to do this?
Flags: needinfo?(wcosta)
(In reply to Michael Shal [:mshal] from comment #137)
> I tried the symlink approach, but it seems somewhere along the way we're
> normalizing the path back to /home/worker, so the same paths end up in the
> final executables.
> 
> I think we should just be able to change /home/worker to /builds/worker (or
> something) everywhere in taskcluster and get the performance boost, though
> I'm having trouble sorting out what to change in order to get the base image
> built with the new paths. Just changing /home/worker to /builds/worker in
> the whole tree did not work :). wcosta, would you be able to take a look at
> what would be involved to do this?

I am working on a patch, but it is trickier than expected. It seems nobody predicted we could change home directory someday.
Flags: needinfo?(wcosta)
I couldn't run tests yet due to bug 1376584
Depends on: 1376584
Comment on attachment 8881632 [details]
Bug 1338651 part 1: Change docker image home dir to /build.

https://reviewboard.mozilla.org/r/152796/#review157944

This is looking awesome! I'd like to re-review afterward to make sure we didn't miss anything. I think the only main issue is making sure we don't have both a /build and a /builds.

Were you able to get an actual build task completed with this? If so I can give it a quick test on my loaner as a sanity check.

::: taskcluster/ci/toolchain/linux.yml:23
(Diff revision 1)
>      run:
>          using: toolchain-script
>          script: build-clang-linux.sh
>          tooltool-downloads: public
>          resources:
> -            - 'build/build-clang/**'
> +            - '/build/build-clang/**'

Did you need to change these for some reason? I believe these are supposed to be tree-relative paths (not full paths), so 'build/build-clang/\*\*' is correct here. (Same with the rest of this file, and macosx.yml and windows.yml)

::: taskcluster/docker/android-gradle-build/Dockerfile:39
(Diff revision 1)
>  # the directory where mozharness is run (not its --work-dir)
> -ADD           buildprops.json /home/worker/
> +ADD           buildprops.json /build/worker/
>  
>  # install tooltool directly from github where tooltool_wrapper.sh et al. expect
>  # to find it
>  RUN wget -O /builds/tooltool.py https://raw.githubusercontent.com/mozilla/build-tooltool/master/tooltool.py

Does this mean we have both /builds and /build now? If so we should just pick one and put everything in there.

::: taskcluster/taskgraph/transforms/job/mozharness_test.py:67
(Diff revision 1)
>      worker['max-run-time'] = test['max-run-time']
>      worker['retry-exit-status'] = test['retry-exit-status']
>  
>      artifacts = [
>          # (artifact name prefix, in-image path)
> -        ("public/logs/", "/home/worker/workspace/build/upload/logs/"),
> +        ("public/logs/", "build/worker/workspace/build/upload/logs/"),

Do these need to have a '/' in the front?

::: taskcluster/taskgraph/transforms/job/mozharness_test.py:86
(Diff revision 1)
>  
>      worker['caches'] = [{
>          'type': 'persistent',
>          'name': 'level-{}-{}-test-workspace'.format(
>              config.params['level'], config.params['project']),
> -        'mount-point': "/home/worker/workspace",
> +        'mount-point': "build/worker/workspace",

Also here for a '/' in front.

::: taskcluster/taskgraph/transforms/job/mozharness_test.py:202
(Diff revision 1)
>  
>      # jittest doesn't have blob_upload_dir
>      if test['test-name'] != 'jittest':
>          artifacts.append({
>              'name': 'public/test_info',
> -            'path': 'build/blobber_upload_dir',
> +            'path': '/build/blobber_upload_dir',

I don't think this is supposed to be a full path, but rather correspond to the 'build' directory that mozharness uses (eg: /build/worker/workspace/build/blobber_upload_dir), so this change should be removed.

::: taskcluster/taskgraph/transforms/job/spidermonkey.py:81
(Diff revision 1)
> -        '--chown-recursive', '/home/worker/tooltool-cache',
> -        '--vcs-checkout', '/home/worker/workspace/build/src',
> +        '--chown-recursive', '/build/worker/tooltool-cache',
> +        '--vcs-checkout', '/build/worker/workspace/build/src',
>          '--',
>          '/bin/bash',
>          '-c',
> -        'cd /home/worker && workspace/build/src/taskcluster/scripts/builder/%s' % script
> +        'cd build/worker && workspace/build/src/taskcluster/scripts/builder/%s' % script

'cd /build/worker' instead of 'cd build/worker'

::: testing/mozharness/configs/single_locale/linux32.py:1
(Diff revision 1)
> -linux.py
> +import os

I think something got messed up with the diff here, or maybe mozreview is doing something funky. What was actually changed here?

::: testing/mozharness/configs/single_locale/linux32_devedition.py:1
(Diff revision 1)
> -linux_devedition.py
> +import os

This file seems weird too. The old version is just the filename, and the new version is the entire file.
Attachment #8881632 - Flags: review?(mshal)
My build try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=adeb2d9777fe386c08db116378a6e6b411bb75fc&selectedJob=110176395

When backlog gets more manageable will do a full try push.
(In reply to Wander Lairson Costa [:wcosta] from comment #142)
> My build try push:
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=adeb2d9777fe386c08db116378a6e6b411bb75fc&selectedJob=1
> 10176395

I grabbed the opt TC build and ran a11yr and sessionrestore on my loaner - the numbers look good to me! Confirmation from an actual talos job would be nice too, of course :). Nice work!
Comment on attachment 8881632 [details]
Bug 1338651 part 1: Change docker image home dir to /build.

https://reviewboard.mozilla.org/r/152796/#review158290

::: commit-message-8f80d:1
(Diff revision 2)
> +Bug 1338651: Change docker image home dir to /build. r=mshal,dustin

Can you double-check with gps or TC members to make sure that https://bugzilla.mozilla.org/show_bug.cgi?id=1338651#c132 won't be a problem?

::: taskcluster/taskgraph/transforms/job/common.py:31
(Diff revision 2)
>          'name': 'level-{}-{}-build-{}-{}-workspace'.format(
>              config.params['level'], config.params['project'],
>              taskdesc['attributes']['build_platform'],
>              taskdesc['attributes']['build_type'],
>          ),
> -        'mount-point': "/home/worker/workspace",
> +        'mount-point': "build/worker/workspace",

This should be "/builds..." instead of "build..."

::: taskcluster/taskgraph/transforms/job/common.py:44
(Diff revision 2)
>  def docker_worker_add_tc_vcs_cache(config, job, taskdesc):
>      taskdesc['worker'].setdefault('caches', []).append({
>          'type': 'persistent',
>          'name': 'level-{}-{}-tc-vcs'.format(
>              config.params['level'], config.params['project']),
> -        'mount-point': "/home/worker/.tc-vcs",
> +        'mount-point': "build/worker/.tc-vcs",

/builds

::: taskcluster/taskgraph/transforms/job/mozharness.py:170
(Diff revision 2)
> -        '--chown-recursive', '/home/worker/tooltool-cache',
> -        '--vcs-checkout', '/home/worker/workspace/build/src',
> -        '--tools-checkout', '/home/worker/workspace/build/tools',
> +        '--chown-recursive', '/builds/worker/tooltool-cache',
> +        '--vcs-checkout', '/builds/worker/workspace/build/src',
> +        '--tools-checkout', '/builds/worker/workspace/build/tools',
>          '--',
>      ]
> -    command.append("/home/worker/workspace/build/src/{}".format(
> +    command.append("build/worker/workspace/build/src/{}".format(

/builds

::: taskcluster/taskgraph/transforms/job/run_task.py:22
(Diff revision 2)
>  
>      # if true, add a cache at ~worker/.cache, which is where things like pip
>      # tend to hide their caches.  This cache is never added for level-1 jobs.
>      Required('cache-dotcache', default=False): bool,
>  
> -    # if true (the default), perform a checkout in /home/worker/checkouts/gecko
> +    # if true (the default), perform a checkout in build/worker/checkouts/gecko

/builds

::: taskcluster/taskgraph/transforms/job/spidermonkey.py:43
(Diff revision 2)
>      if int(config.params['level']) > 1:
>          worker['caches'].append({
>              'type': 'persistent',
>              'name': 'level-{}-{}-build-spidermonkey-workspace'.format(
>                  config.params['level'], config.params['project']),
> -            'mount-point': "/home/worker/workspace",
> +            'mount-point': "build/worker/workspace",

/builds

::: taskcluster/taskgraph/transforms/job/toolchain.py:124
(Diff revision 2)
>      worker['command'] = [
> -        '/home/worker/bin/run-task',
> +        '/builds/worker/bin/run-task',
>          # Various caches/volumes are default owned by root:root.
> -        '--chown-recursive', '/home/worker/workspace',
> -        '--chown-recursive', '/home/worker/tooltool-cache',
> -        '--vcs-checkout=/home/worker/workspace/build/src',
> +        '--chown-recursive', '/builds/worker/workspace',
> +        '--chown-recursive', '/builds/worker/tooltool-cache',
> +        '--vcs-checkout=build/worker/workspace/build/src',

/build

::: taskcluster/taskgraph/transforms/job/toolchain.py:128
(Diff revision 2)
> -        '--chown-recursive', '/home/worker/tooltool-cache',
> -        '--vcs-checkout=/home/worker/workspace/build/src',
> +        '--chown-recursive', '/builds/worker/tooltool-cache',
> +        '--vcs-checkout=build/worker/workspace/build/src',
>          '--',
>          'bash',
>          '-c',
> -        'cd /home/worker && '
> +        'cd build/worker && '

/builds/worker

::: taskcluster/taskgraph/transforms/marionette_harness.py:34
(Diff revision 2)
>  
>          task['worker']['caches'] = [{
>              'type': 'persistent',
>              'name': 'level-{}-{}-tc-vcs'.format(
>                  config.params['level'], config.params['project']),
> -            'mount-point': "/home/worker/.tc-vcs",
> +            'mount-point': "build/worker/.tc-vcs",

/builds

::: taskcluster/taskgraph/transforms/repackage.py:146
(Diff revision 2)
>                  'type': 'file',
> -                'path': '/home/worker/workspace/build/artifacts/target.dmg',
> +                'path': '/builds/worker/workspace/build/artifacts/target.dmg',
>                  'name': 'public/build/{}target.dmg'.format(locale_output_path),
>              }, {
>                  'type': 'file',
>                  'path': '/home/worker/workspace/build/artifacts/target.complete.mar',

Looks like we'll need to update this too since it was just added.

::: testing/mozharness/configs/single_locale/linux32.py:1
(Diff revision 2)
> -linux.py
> +import os

Ohhhh, I think the problem is that this file is in the tree as a symlink. Maybe your script changed it from a symlink to a normal file, so it shows up as a weird diff here? If so I think you can just undo that part of the patch and leave it as a symlink.

::: testing/mozharness/configs/single_locale/linux32_devedition.py:1
(Diff revision 2)
> -linux_devedition.py
> +import os

Similarly here, this file should just stay as a symlink.
Attachment #8881632 - Flags: review?(mshal)
Blocks: 1376143
Comment on attachment 8881632 [details]
Bug 1338651 part 1: Change docker image home dir to /build.

https://reviewboard.mozilla.org/r/152796/#review158290

> Can you double-check with gps or TC members to make sure that https://bugzilla.mozilla.org/show_bug.cgi?id=1338651#c132 won't be a problem?

I actually logged in a docker image with one click loaner and confirmed /builds/worker is mounted in a ext4 FS.
We were poking around a bit more and found where the accesses are coming from using dtrace on the whole system (suggested by ted):

dtrace -n 'syscall::open*:entry { printf("%s %s",execname,copyinstr(arg0)); }'

On a /builds build, the results look like this:

  0    151                       open:entry sandboxd /builds/slave/try-m64-0000000000000000000000/build/src/obj-firefox/mozglue/build/../misc/TimeStamp.o
  0    151                       open:entry sandboxd /builds/slave/try-m64-0000000000000000000000/build/src/obj-firefox/mozglue/build/../misc/TimeStamp.o
  0    151                       open:entry sandboxd /builds/slave/try-m64-0000000000000000000000/build/src/obj-firefox/mozglue/build/../misc/TimeStamp.o
  0    151                       open:entry sandboxd /builds/slave/try-m64-0000000000000000000000/build/src/obj-firefox/mozglue/build/../misc/TimeStamp.o
  1    151                       open:entry sandboxd /builds/slave/try-m64-0000000000000000000000/build/src/obj-firefox/mozglue/build/../misc/TimeStamp_darwin.o
  0    151                       open:entry sandboxd /builds/slave/try-m64-0000000000000000000000/build/src/obj-firefox/mozglue/build/../misc/TimeStamp_darwin.o
  0    151                       open:entry sandboxd /builds/slave/try-m64-0000000000000000000000/build/src/obj-firefox/mozglue/build/../misc/TimeStamp_darwin.o
  0    151                       open:entry sandboxd /builds/slave/try-m64-0000000000000000000000/build/src/obj-firefox/mozglue/build/../misc/TimeStamp_darwin.o

So sandboxd is doing 4 opens on each of the ~5000 .o files in XUL.

On a /home build, the sandboxd lookup relies on automountd and opendirectoryd, which is why those two processes spike (bug 1376143). Each lookup looks like this:

  0    151                       open:entry sandboxd /home/worker/workspace/build/src/obj-firefox/mozglue/build/../misc/TimeStamp.o
  0    937              open_nocancel:entry automountd /etc/auto_home
  1    937              open_nocancel:entry automountd /etc/auto_home
  1    937              open_nocancel:entry automountd /usr/libexec/od_user_homes
  3    151                       open:entry od_user_homes /dev/dtracehelper
  2    151                       open:entry od_user_homes /System/Library/OpenDirectory/request-schema.plist
  2    937              open_nocancel:entry od_user_homes /usr/libexec
  2    937              open_nocancel:entry od_user_homes /usr/libexec
  2    151                       open:entry od_user_homes /usr/libexec/od_user_homes

sandboxd tries to do the open, which automountd picks up because of the auto_home mount, and automountd uses opendirectory to do some sort of lookup:

automountd[895]: od_search for worker in auto_home failed
automountd[895]: od_search for * in auto_home failed

(That's from syslog with AUTOMOUNTD_VERBOSE=TRUE set in /etc/autofs.conf)

So far I haven't found much info on why sandboxd would be trying to open every file listed in the debug info when the process isn't being debugged, but I wanted to document it here for future reference.
Hopefully the last patch catches up everything. Here is my try push https://treeherder.mozilla.org/#/jobs?repo=try&revision=641ce431e0c5&selectedJob=110866341

macosx tests probably won't finish before tomorrow morning
Status: NEW → ASSIGNED
Comment on attachment 8881632 [details]
Bug 1338651 part 1: Change docker image home dir to /build.

https://reviewboard.mozilla.org/r/152796/#review158708
Attachment #8881632 - Flags: review?(dustin) → review+
Comment on attachment 8881632 [details]
Bug 1338651 part 1: Change docker image home dir to /build.

https://reviewboard.mozilla.org/r/152796/#review158730

Awesome!
Attachment #8881632 - Flags: review?(mshal) → review+
Pushed by wcosta@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/feedcde68c2a
Change docker image home dir to /build. r=dustin,mshal
(In reply to Michael Shal [:mshal] from comment #147)
> So far I haven't found much info on why sandboxd would be trying to open
> every file listed in the debug info when the process isn't being debugged,
> but I wanted to document it here for future reference.

It looks sandboxd tries to symbolicate sandbox violations. If this is the case, we should probably stop our violations.
Backout by cbook@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/c7c5ffd2eb39
Backed out changeset feedcde68c2a for bustage
(In reply to Carsten Book [:Tomcat] from comment #154)
> backed out for failures like
> https://treeherder.mozilla.org/logviewer.html#?job_id=111051404&repo=autoland

I don't think this is a bug in this patch. Re-triggering failed jobs made them to work. From the errors I see, I suspect that these failed jobs were scheduled in running instances, and they didn't download newer docker images, causing the job trying to access an nonexistent directory. I am investigating to see if this is really the case.
Flags: needinfo?(wcosta)
(In reply to Wander Lairson Costa [:wcosta] from comment #156)
> (In reply to Carsten Book [:Tomcat] from comment #154)
> > backed out for failures like
> > https://treeherder.mozilla.org/logviewer.html#?job_id=111051404&repo=autoland
> 
> I don't think this is a bug in this patch. Re-triggering failed jobs made
> them to work. From the errors I see, I suspect that these failed jobs were
> scheduled in running instances, and they didn't download newer docker
> images, causing the job trying to access an nonexistent directory. I am
> investigating to see if this is really the case.

So, it feels like something cache related, I compared a failed to a succeeded job. In the failed job, it find a repo cache:

[vcs 2017-06-30T06:41:17.576319Z] executing ['hg', 'robustcheckout', '--sharebase', '/builds/worker/checkouts/hg-store', '--purge', '--branch', 'default', 'https://hg.mozilla.org/build/tools', '/builds/worker/workspace/build/tools']
[vcs 2017-06-30T06:41:17.640157Z] ensuring https://hg.mozilla.org/build/tools@default is available at /builds/worker/workspace/build/tools
[vcs 2017-06-30T06:41:17.640347Z] (existing repository shared store: /home/worker/checkouts/hg-store/7ae7fb134bf7aec6ec96a062ff47a69053dd2973/.hg)
[vcs 2017-06-30T06:41:17.640375Z] (shared store does not exist; deleting)
[vcs 2017-06-30T06:41:18.213912Z] (sharing from existing pooled repository 7ae7fb134bf7aec6ec96a062ff47a69053dd2973)
[vcs 2017-06-30T06:41:18.415607Z] searching for changes
[vcs 2017-06-30T06:41:18.416449Z] no changes found
[vcs 2017-06-30T06:41:18.609292Z] (pulling to obtain default)
[vcs 2017-06-30T06:41:19.190337Z] (remote resolved default to d5a286aff22cc87361797f2aefc08c1ad1d85889; result is not deterministic)
[vcs 2017-06-30T06:41:19.192146Z] (revision already present locally; not pulling)
[vcs 2017-06-30T06:41:19.353057Z] 1054 files updated, 0 files merged, 0 files removed, 0 files unresolved
[vcs 2017-06-30T06:41:19.353862Z] updated to d5a286aff22cc87361797f2aefc08c1ad1d85889

While in the success job, no repo cache found:

[vcs 2017-06-30T14:47:53.519313Z] executing ['hg', 'robustcheckout', '--sharebase', '/builds/worker/checkouts/hg-store', '--purge', '--upstream', 'https://hg.mozilla.org/mozilla-unified', '--revision', '0c16e7a1a10f61010eb9e5cb35c1fe1dc4c95fb5', 'https://hg.mozilla.org/integration/autoland/', '/builds/worker/workspace/build/src']
[vcs 2017-06-30T14:47:53.592334Z] ensuring https://hg.mozilla.org/integration/autoland/@0c16e7a1a10f61010eb9e5cb35c1fe1dc4c95fb5 is available at /builds/worker/workspace/build/src
[vcs 2017-06-30T14:47:53.592501Z] (cloning from upstream repo https://hg.mozilla.org/mozilla-unified)
[vcs 2017-06-30T14:47:54.231734Z] (sharing from existing pooled repository 8ba995b74e18334ab3707f27e9eb8f4e37ba3d29)

Later, the failed job detects a python exe inside venv directory and "thinks" virtualenv is already created:

[task 2017-06-30T06:41:19.604118Z] 06:41:19     INFO - Creating virtualenv /builds/worker/workspace/build/venv
[task 2017-06-30T06:41:19.606797Z] 06:41:19     INFO - Virtualenv /builds/worker/workspace/build/venv/bin/python appears to already exist; skipping virtualenv creation.
[task 2017-06-30T06:41:19.606984Z] 06:41:19     INFO - Getting output from command: ['/builds/worker/workspace/build/venv/bin/pip', '--version']
[task 2017-06-30T06:41:19.607126Z] 06:41:19     INFO - Copy/paste: /builds/worker/workspace/build/venv/bin/pip --version
[task 2017-06-30T06:41:19.609710Z] Traceback (most recent call last):
[task 2017-06-30T06:41:19.609755Z]   File "/builds/worker/workspace/build/src/testing/mozharness/scripts/fx_desktop_build.py", line 258, in <module>
[task 2017-06-30T06:41:19.609775Z]     fx_desktop_build = FxDesktopBuild()
[task 2017-06-30T06:41:19.609804Z]   File "/builds/worker/workspace/build/src/testing/mozharness/scripts/fx_desktop_build.py", line 106, in __init__
[task 2017-06-30T06:41:19.609826Z]     super(FxDesktopBuild, self).__init__(**buildscript_kwargs)
[task 2017-06-30T06:41:19.609856Z]   File "/builds/worker/workspace/build/src/testing/mozharness/mozharness/mozilla/building/buildbase.py", line 647, in __init__
[task 2017-06-30T06:41:19.609905Z]     self.create_virtualenv()
[task 2017-06-30T06:41:19.609939Z]   File "/builds/worker/workspace/build/src/testing/mozharness/mozharness/base/python.py", line 437, in create_virtualenv
[task 2017-06-30T06:41:19.609979Z]     halt_on_failure=True)
[task 2017-06-30T06:41:19.610014Z]   File "/builds/worker/workspace/build/src/testing/mozharness/mozharness/base/script.py", line 1564, in get_output_from_command
[task 2017-06-30T06:41:19.610180Z]     cwd=cwd, stderr=tmp_stderr, env=env)
[task 2017-06-30T06:41:19.610204Z]   File "/usr/lib/python2.7/subprocess.py", line 710, in __init__
[task 2017-06-30T06:41:19.610284Z]     errread, errwrite)
[task 2017-06-30T06:41:19.610309Z]   File "/usr/lib/python2.7/subprocess.py", line 1335, in _execute_child
[task 2017-06-30T06:41:19.610457Z]     raise child_exception
[task 2017-06-30T06:41:19.610477Z] OSError: [Errno 2] No such file or directory

The success job just creates virtualenv from scratch. :gps, any clue if robustcheckout has to do anything with this?
Flags: needinfo?(gps)
I think this has to do with virtualenv interaction in mozharness, not robustcheckout.
Flags: needinfo?(gps)
Looks like we already have bug 1306663 for these specific sandbox violations. In any case, we still need to play nicer when we do have violations.
See Also: → 1306663
:Tomcat, may we try to land it again? Seems like it is a transient problem (it is a bug, but it is cache related and seems to be gone away) unrelated to the patch.
Flags: needinfo?(cbook)
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/485d1af7805a
Change docker image home dir to /build. r=dustin,mshal
I added a CLOBBER touch to the commit. We'll see if that helps.
Flags: needinfo?(cbook)
Pushed by kwierso@gmail.com:
https://hg.mozilla.org/mozilla-central/rev/f8a62747c51c
Change docker image home dir to /build. r=dustin,mshal a=hopefullyavoidingmergebustagelater
Grafted this over to m-c so we can hopefully get all trunk branches using this, on the theory that having workers shared between different trees might cause issues if things are cached differently.
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
:njn, would we need to update the valgrind suppression files since we changed the path that builds are happening in in Taskcluster? They seem to be failing since landing the patch in this bug (which changes /home/worker to /builds/worker)

https://public-artifacts.taskcluster.net/WXpSmrWYSBmXHM9pJO52Aw/0/public/logs/live_backing.log
Flags: needinfo?(n.nethercote)
The Valgrind errors are leaks that look like this:

> ==42858== 16 bytes in 1 blocks are definitely lost in loss record 46 of 256
> ==42858==    at 0x4C2B525: operator new(unsigned long) (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==42858==    by 0x17509424: ???
> ==42858==    by 0x1779BBA5: ???
> ==42858==    by 0x16ADCCAA: ???
> ==42858==    by 0xFFF0001B2: ???
> ==42858==    by 0x772F72656B726F76: ???
> ==42858==    by 0x65636170736B726E: ???
> ==42858==    by 0x732F646C6975622E: ???

Only the first stack frame has a function name. It's quite possible that this is a leak for which we have a suppression, but without the other function names in the stack trace the suppression won't apply.

There are other (unimportant) Valgrind warnings that *do* have stack traces, like this one:

> --42853-- WARNING: unhandled syscall: 318
> ==42853==    at 0x5CEE419: syscall (in /lib64/libc-2.12.so)
> ==42853==    by 0x11360270: GenerateRandomSeed (jsmath.cpp:743)
> ==42853==    by 0x11360270: GenerateXorShift128PlusSeed (jsmath.cpp:767)
> ==42853==    by 0x11360270: JSCompartment::ensureRandomNumberGenerator() (jsmath.cpp:777)
> ==42853==    by 0x10F4E250: JSCompartment::randomHashCode() (jscompartment.cpp:1408)
> ==42853==    by 0x111321BC: JS::Symbol::new_(JSContext*, JS::SymbolCode, JSString*)

So it appears the debuginfo isn't completely messed up for the process, just partly messed up. It's possible that the missing symbols are for system libraries, while symbols for Firefox itself are ok.

In short, could this change have affected symbols access somehow, particularly for system libraries?
Flags: needinfo?(n.nethercote)
The Valgrind failures /might/ be due to rebuilding the Docker images and picking up newer package versions. This was first observed in bug 1272629 comment 34.

Basically, we can't touch aspects of the desktop-build image. If we do a rebuild from scratch and pick up the latest yum packages, we inherit some yet-to-be-triaged Valgrind bustage. The proper fix is to run our own package server configured in such a way to allow deterministic operations over time. Bug 1289812 tracks and has more details.
Reopened to fix valgrind errors.
Status: RESOLVED → REOPENED
Flags: needinfo?(wcosta)
Resolution: FIXED → ---
So, it feels like it is just a single leak case. If I add a suppression for it [1], the task succeeds [2].
I ran valgrind under strace to track all "open" calls, for current good builds [3] and the failed [4], but I couldn't spot a difference between the two.

I also tried to map which libs the stack trace is from, but it seems valgrind doesn't provide something like map procfile.

:njn :jseward ideas?

[1] https://hg.mozilla.org/try/rev/6cf065d95bef6c9c9b966b7fbb8bd6aa4ebfdce8
[2] https://treeherder.mozilla.org/#/jobs?repo=try&revision=43362462339c99c5ba8e00aef5ee051ad9d77051&selectedJob=113957343
[3] https://treeherder.mozilla.org/#/jobs?repo=try&revision=27a0847991eabb1ee85e90b73bbef17c7a860771&selectedJob=113798780
[4] https://treeherder.mozilla.org/#/jobs?repo=try&revision=220d84752e367d10fa38ec723883e014ebd9909d&selectedJob=113909336
Flags: needinfo?(n.nethercote)
Flags: needinfo?(jseward)
My observations in comment 166 still apply. This appears to be a issue with symbols and/or debuginfo.
Flags: needinfo?(n.nethercote)
See Also: → 1378968
Does the current fix result in the browser not trying to open all the object files using the Linux build paths, or will they still be opened but the paths are moved out of /home? If they're still opened, that might cause lots of sandbox violations, depending on the path and whether or not we do the open before the sandbox is enabled in plugin-container. If we can't stop them from being opened, we can have the sandbox allow reads from those paths, but we'll have to come up with a path scheme that makes sense keeping security in mind.
(In reply to Haik Aftandilian [:haik] from comment #173)
> Does the current fix result in the browser not trying to open all the object
> files using the Linux build paths, or will they still be opened but the
> paths are moved out of /home? If they're still opened, that might cause lots
> of sandbox violations, depending on the path and whether or not we do the
> open before the sandbox is enabled in plugin-container. If we can't stop
> them from being opened, we can have the sandbox allow reads from those
> paths, but we'll have to come up with a path scheme that makes sense keeping
> security in mind.

That's the other way around, actually. It's *because* there are sandbox violations (and they are old, from bug 1306663) that those paths are touched, because the sandbox daemon is trying to symbolicate the violations.
(In reply to Mike Hommey [:glandium] from comment #174)
> (In reply to Haik Aftandilian [:haik] from comment #173)
> > Does the current fix result in the browser not trying to open all the object
> > files using the Linux build paths, or will they still be opened but the
> > paths are moved out of /home? If they're still opened, that might cause lots
> > of sandbox violations, depending on the path and whether or not we do the
> > open before the sandbox is enabled in plugin-container. If we can't stop
> > them from being opened, we can have the sandbox allow reads from those
> > paths, but we'll have to come up with a path scheme that makes sense keeping
> > security in mind.
> 
> That's the other way around, actually. It's *because* there are sandbox
> violations (and they are old, from bug 1306663) that those paths are
> touched, because the sandbox daemon is trying to symbolicate the violations.

Thanks for the clarification. If that's the case, I don't think the paths will make a difference as they're being opened by sandboxd and not plugin-container. Disabling logging of those specific violations may avoid the symbolification. That's something we can look into until we can avoid them entirely.
See Also: → 1383841
See Also: → 1306244
Should we close this as it was already fixed by bug 1383841?
Flags: needinfo?(ted)
(In reply to Wander Lairson Costa [:wcosta] from comment #176)
> Should we close this as it was already fixed by bug 1383841?

It's less critical now, but we still need to change the /home paths despite bug 1383841 being fixed.

This problem also appears to cause Activity Monitor to take a long time to sample a Nightly content process built on automation. For a local build, it takes about 10 seconds on my 2015 MacBook compared to over a minute with my Nightly channel install. During that time, opendirectoryd and automountd's CPU utilization is spiking like what is reported in bug 1376143.

This probably affects the Mac crash reporter tool too. There may be other tools that do symbolification that will trigger it too.

Less importantly, if violation logging is turned on for debugging, we'll hit the problem. Although they shouldn't be logged now due to bug 1383841, it will be some time before our sandboxed processes run without triggering any violations. And it's always possible that an OS X update will change the sandboxing implementation in an unexpected way that triggers a violation to be logged.
I wonder if this could also be why the vp9 playback benchmark was always under-reporting macbook performance.
User Story: (updated)
Flags: needinfo?(ted)
Summary: taskcluster cross-compiled OS X builds show Talos performance regressions vs. buildbot builds → taskcluster cross-compiled OS X builds create perf problems when not stripped (Talos performance regressions vs. buildbot builds)
The User Story seems like as good a place as any to summarize the actual problem here.
Flags: needinfo?(jseward)
Depends on: 1391806
Comment on attachment 8896446 [details]
Bug 1338651 part 2: Add suppression for leaks in libllvm-3.6-mesa.

https://reviewboard.mozilla.org/r/167694/#review176928

::: build/valgrind/x86_64-redhat-linux-gnu.sup:644
(Diff revision 2)
> +{
> +   static-object-leaks-in-libLLVM-3.6-mesa.so.  See bug 1338651.
> +   Memcheck:Leak
> +   match-leak-kinds: definite
> +   fun:_Znwm
> +   ...

I think you should remove the ... here.  That way
we just specify 2 frames of -mesa.so and then
a call to new.  This at least rules out the possibility of suppressing a leak where -mesa.so calls some other object which calls onto new.
Attachment #8896446 - Flags: review?(jseward) → review+
jseward, I updated the patch, could you please give another look?
Flags: needinfo?(jseward)
Comment on attachment 8896446 [details]
Bug 1338651 part 2: Add suppression for leaks in libllvm-3.6-mesa.

https://reviewboard.mozilla.org/r/167694/#review177232

::: build/valgrind/x86_64-redhat-linux-gnu.sup:648
(Diff revision 3)
> +   fun:_Znwm
> +   obj:/*/lib*/libLLVM-3.6-mesa.so
> +   obj:/*/lib*/libLLVM-3.6-mesa.so
> +}
> +
> +{

AFAICS you don't need this one.  The one above is
strictly more general, because _ZN4llvm14object_creator* and _ZNK4llvm17ManagedStaticBase* are both functions in libLLVM-3.6-mesa.so, so the general one should match whenever this one does.

That said, I think the existing supp ("apparent leaks in libLLVM-3.6-mesa.so, August 2017.  See bug 1382280." is also made redundant by the general one.    The best would be if you could remove both of the specific ones and check that the new, general one, catches all of the leaks.

Sorry -- I should have been clearer about this earlier on.
Attachment #8896446 - Flags: review+
Flags: needinfo?(jseward)
Comment on attachment 8896446 [details]
Bug 1338651 part 2: Add suppression for leaks in libllvm-3.6-mesa.

https://reviewboard.mozilla.org/r/167694/#review178490
Attachment #8896446 - Flags: review?(jseward) → review+
Pushed by wcosta@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/8cf125b4fa04
part 1: Change docker image home dir to /build. r=dustin,mshal
https://hg.mozilla.org/integration/autoland/rev/2f07b13f4517
part 2: Add suppression for leaks in libllvm-3.6-mesa. r=jseward
This improved some of our platform micro tests. On OSX it turned them from bi-modal to uni-modal:

== Change summary for alert #9128 (as of August 31 2017 09:30 UTC) ==

Improvements:

 27%  Strings PerfStripCRLF osx-10-10 opt      209,566.33 -> 152,491.00
 24%  Strings PerfCompressWhitespace osx-10-10 opt 324,455.04 -> 246,625.42

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=9128
See Also: → 1383805
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: