Modify file structure of B2G.app to allow for building of gaia after OSX v2 changes to Firefox (bug 1047584)

RESOLVED FIXED in Firefox 39

Status

defect
RESOLVED FIXED
5 years ago
4 years ago

People

(Reporter: rickychien, Assigned: spohl)

Tracking

unspecified
2.2 S4 (23jan)
All
macOS
Dependency tree / graph

Firefox Tracking Flags

(firefox39 fixed)

Details

Attachments

(1 attachment, 4 obsolete attachments)

As discussed in bug 1089710, the meaning of the GreD has been changed in current B2G.app so that we got a NS_ERROR_FILE_NOT_FOUND from Cu.import("resource://gre/bin/modules/...").

As Stephen Pohl mentioned, we should update the B2G.app package to make it work.

Stephen, do you know who can help with this?
Flags: needinfo?(spohl.mozilla.bugs)
Is this really a Firefox-for-desktop bug? If not, can you please move it to the appropriate component? Thanks!
Flags: needinfo?(ricky060709)
I don't know who could tackle this on the B2G team, but I'm happy to assist and provide feedback.
Flags: needinfo?(spohl.mozilla.bugs)
Component: General → Emulator
Product: Firefox → Firefox OS
Version: Trunk → unspecified
Actually, I may not have fully understood what's going on here. With bug 1066123 landed, I expected B2G.app on OSX to work. However, I now see that we may accidentally be setting the GreD to Contents/Resources for b2g as well (i.e. not only for Desktop Firefox OSX) because |#ifdef XP_MACOSX| may be true for b2g for Mac OSX.

Ricky, can you explain step-by-step how I can reproduce the error locally that you reported in comment 0? That would help me confirm my suspicion. Is B2G.app completely broken, or just in a particular test suite?

Someone on the b2g team might want to make a call whether we want to invest the time to change B2G's GreD to point to the old Contents/MacOS or if the time is better spent changing the package layout to match Firefox Desktop (which would also allow for B2G.app to be signed with Apple's v2 signatures). Benjamin, are you able to make this call, or would you know who could?
Flags: needinfo?(benjamin)
No, I really don't know.
Flags: needinfo?(benjamin)
Flags: needinfo?(ricky060709)
STR:

1. clone gaia repo. If you don't familiar, please see also [1]
2. Just apply the patch "B2G_SDK_VERSION := 36.0a1" in bug 1089710. It's a very simple patch, only modify 2 lines in gaia/Makefile
3. cd into gaia folder and |make|

Then you will see error like NS_ERROR_FILE_NOT_FOUND as I mentioned on [2].
Our patch modified B2G version from 34.0a1 to 36.0a1 and it crash build system.


[1] https://developer.mozilla.org/fr/Firefox_OS/Platform/Gaia/Hacking
[2] https://bugzilla.mozilla.org/show_bug.cgi?id=1089710#c6
(In reply to Stephen Pohl [:spohl] from comment #3)

> Someone on the b2g team might want to make a call whether we want to invest
> the time to change B2G's GreD to point to the old Contents/MacOS or if the
> time is better spent changing the package layout to match Firefox Desktop
> (which would also allow for B2G.app to be signed with Apple's v2
> signatures). Benjamin, are you able to make this call, or would you know who
> could?

Stephen,

I would rather change the package layout to match Fx Desktop, but I have no idea how much work this means for you.
(In reply to Fabrice Desré [:fabrice] from comment #6)
> Stephen,
> 
> I would rather change the package layout to match Fx Desktop, but I have no
> idea how much work this means for you.

Changing the package layout itself shouldn't take too long. I'm more concerned about all the different ways that we might break B2G on OSX and all the test suites that need updating. I would either need someone to help me verify the changes and make necessary changes to the client and test suites, or I need a very detailed list of:

1. What builds we care about.
2. How to build them locally.
3. How to verify that they work (Are automated tests enough? Do we need to verify on devices?).
4. What test suites need to pass (also specify if these tests need to pass when invoked via |mach| or |mozharness|, or both).
5. What version of Firefox are we targeting?

To give you an idea: less than 20% of my time was spent making changes to the actual package layout on Desktop. About 30% was spent on making client side changes in Firefox to find the files in their new location. 50% or more was spent on updating test suites. Considering that I was familiar with the platform (Desktop Firefox OSX), but not with B2G, I expect the client changes and test suites to take even more time compared to the package layout changes.

I'll get started on the package layout changes, but help with the rest would be greatly appreciated.
Depends on: 1047584
Summary: NS_ERROR_FILE_NOT_FOUND in Cu.import("resource://gre/bin/modules/...") → Modify file structure of B2G.app to allow for OSX v2 signing and for building of gaia after OSX v2 changes to Firefox (bug 1047584)
Posted patch Modify file structure (obsolete) — Splinter Review
This should take care of the bundle's directory structure. I've verified that I can now run 'make' for gaia and start it.

I've kicked off a fairly extensive try run to see where we stand:
https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=f247092a66d0

If there is anything else that we need besides this try run (see comment 7), please let me know.
Assignee: nobody → spohl.mozilla.bugs
Status: NEW → ASSIGNED
We got all the results from the try. I see some oranges, are they perma?
I seem to remember that at the time of my last try run, there were a bunch of intermittent failures happening on try that were unrelated to the changes here. I've kicked off a fresh try run, which will hopefully result in fewer oranges:

https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=b5682f86539c
Comment on attachment 8529266 [details] [diff] [review]
Modify file structure

Try is green. However, I have no idea if that's sufficient (see questions in comment 7).

Fabrice, do you know who can review this and answer the questions in comment 7?
Attachment #8529266 - Attachment description: Modify file structure (wip) → Modify file structure
Flags: needinfo?(fabrice)
Stephen, that's a tough one as we don't have anyone that cares especially about Mac.

In comment 7 you say that you spent time making sure that Firefox finds the files in their new location. Which resources was it? We don't have any toolkit/ based UI in b2g so we may be in a better situation.
Flags: needinfo?(fabrice)
Comment on attachment 8529266 [details] [diff] [review]
Modify file structure

(In reply to Fabrice Desré [:fabrice] from comment #12)
> Stephen, that's a tough one as we don't have anyone that cares especially
> about Mac.

Robert, since this is almost identical to what was done on Mac, would you feel comfortable reviewing this?

> In comment 7 you say that you spent time making sure that Firefox finds the
> files in their new location. Which resources was it? We don't have any
> toolkit/ based UI in b2g so we may be in a better situation.

The approach for Firefox was mostly based on trial-and-error. If something broke while running Firefox, it was most likely due to a resource that couldn't be found. This would be repeated until Firefox appeared to run normally again. Ugly and time-consuming, but it worked.

Unfortunately, since I'm unfamiliar with running Mac b2g builds, I wouldn't even know where to start or what to look for...
Attachment #8529266 - Flags: review?(robert.strong.bugs)
Comment on attachment 8529266 [details] [diff] [review]
Modify file structure

I'm fine with this from a code perspective. Please verify that the end result build is what is expected regarding file locations before landing.
Attachment #8529266 - Flags: review?(robert.strong.bugs) → review+
Thanks, Robert. The file locations are what I currently expect. However, we may want to further improve this before signing with v2 signatures, so I'm dropping that part from the title of the bug. There are a significant number of executables left in Contents/Resources. Most of these are tests related. Since I don't know if these tests are run on try, or how to run those manually, I don't feel comfortable moving those yet.

I suggest that we go ahead and land this patch as-is to allow for building of gaia on OSX. This would also allow us to detect any fallout and fix in followup bugs, similar to the way we did it with Desktop Firefox.
Summary: Modify file structure of B2G.app to allow for OSX v2 signing and for building of gaia after OSX v2 changes to Firefox (bug 1047584) → Modify file structure of B2G.app to allow for building of gaia after OSX v2 changes to Firefox (bug 1047584)
Fabrice, are you ok with me landing this patch? I've verified that the try build starts up and runs, but beyond that I'm not sure what to check for. If you want to try it for yourself, a green try run is referenced in comment 10.
Flags: needinfo?(fabrice)
Yep, I'm fine with landing. Feeling adventurous today ;)
Flags: needinfo?(fabrice)
Sounds possible. Unfortunately, this is about as far as I can go myself. Hopefully, fixing these tests won't be too difficult for someone familiar with these tests and b2g.
Assignee: spohl.mozilla.bugs → nobody
Status: ASSIGNED → NEW
Fabrice, Ricky, do you know who can finish this? 

It's been blocking us for over 2 months now from landing bug 994357 and from doing integration tests on MacOS.
Flags: needinfo?(ricky060709)
Flags: needinfo?(fabrice)
@Stephen Pohl, is it probably a path issue in test scripts? Or do you know who can help us? (I'm totally no idea with b2g).
Flags: needinfo?(ricky060709)
Unfortunately, I have no idea. The concerning part is that my try builds turned green when I tested this patch. After landing on inbound, many builds turned green as well, but failures started happening more frequently as time went on. In other words: the tests might have been using old builds to run tests. If this is true, I'd expect that the failures started disappearing slowly, rather than abruptly, after backing out this patch. Ryan, would you happen to know whether this was the case or not?
Flags: needinfo?(ryanvm)
They went away quickly. My recollection is that we only had one occurrence of the failures after the backout landed on inbound.
Flags: needinfo?(ryanvm)
Looks like we have a patch, and just need to get the tests re-enabled here? Normally I'd probably bug :zac about this, but since he's gone I'm not sure who the best PoC is.

Jonathan - could you possibly take a look or suggest someone? We'd like to investigate why gaia ui tests fail with this patch.
Flags: needinfo?(jgriffin)
I'm running the gaia-ui tests on desktop b2g, using a downloaded build.
I can perhaps try and test the patch with a locally build gaia, but last time I tried, I had trouble doing that.
Flags: needinfo?(fabrice)
The "can't find port" error sometimes means that the test target didn't launch correctly, I believe. 

Given the change, I wonder if it's as simple as that the harness tries to launch B2G Desktop from the @BINPATH@ Contents/MacOS instead of the @RESPATH@ Contents/Resources. On a side note, it's honestly a little surprising (in that least surprise kind of way) that @BINPATH@ means something different between the platforms and isn't really the path to the binary on Mac.

I also notice the log follows that with "04:34:04 INFO - Could not kill process, could not find pid: 2615, assuming it's already dead" -- that supports the idea that the process simply didn't launch.
Scratch the side comment--I misread the patch comment re: how @BINPATH@/@RESPATH@ works. Path change to the launcher (or some other dependency that prevents launch) still seems likely to me.
(In reply to Geo Mealer [:geo] from comment #27)
> The "can't find port" error sometimes means that the test target didn't
> launch correctly, I believe. 
> 
> Given the change, I wonder if it's as simple as that the harness tries to
> launch B2G Desktop from the @BINPATH@ Contents/MacOS instead of the
> @RESPATH@ Contents/Resources.

You're right that it's most likely a mixup between @BINPATH@ and @RESPATH@, but it should be the other way around. It should launch executables out of Contents/MacOS, i.e. @BINPATH@ but may be looking for it under @RESPATH@, in which case we should update the test. Or, it's looking for the executable under @BINPATH@ but the executable wasn't copied there during packaging (see comment 15 regarding test executables that are left in @RESPATH@), in which case it should be added to the MacOS-files.in file.

> On a side note, it's honestly a little
> surprising (in that least surprise kind of way) that @BINPATH@ means
> something different between the platforms and isn't really the path to the
> binary on Mac.

@BINPATH@ is supposed to mean the same thing on all platforms, i.e. the path to the executable. Only @RESPATH@ differs between platforms and unfortunately, due to Apple's v2 signing, there really is no way around this (also see [1]).
 
> I also notice the log follows that with "04:34:04 INFO - Could not kill
> process, could not find pid: 2615, assuming it's already dead" -- that
> supports the idea that the process simply didn't launch.

Agreed. What I couldn't figure out was if this was the b2g process, or one of the test executables. Also, the fact that this failure was intermittent concerned me.


[1] http://mxr.mozilla.org/mozilla-central/source/browser/installer/package-manifest.in#29
Oh, mid-aired. Oh well, maybe the additional info is still helpful.
I'm not clear what the ask is here.  Are you asking us to get Gij tests running on OSX b2gdesktop?
Flags: needinfo?(jgriffin) → needinfo?(kgrandon)
(In reply to Jonathan Griffin (:jgriffin) from comment #31)
> I'm not clear what the ask is here.  Are you asking us to get Gij tests
> running on OSX b2gdesktop?

We know that. They fail because master gaia uses features that are not present in Gecko 34 and we use Gecko 34 when running Gij.

But we're trying to understand why tests fail with this patch (that basically enables us to update Gecko to 36 - see bug 1089710)
It seems like we have some good information in comment 28 and comment 29, now we just need someone to dig into the code. I'd volunteer, but I won't be able to look till after the holidays most likely. If this bug is still available then I wouldn't mind digging in. Hopefully someone will have some time before then though.
Flags: needinfo?(kgrandon)
(In reply to Kevin Grandon :kgrandon from comment #33)
> It seems like we have some good information in comment 28 and comment 29,
> now we just need someone to dig into the code. I'd volunteer, but I won't be
> able to look till after the holidays most likely. If this bug is still
> available then I wouldn't mind digging in. Hopefully someone will have some
> time before then though.

I dug through the code and log a bit after I made my comment. It's launching from the correct path as far as b2g-bin goes--that's really a command-line parameter. 

I have no idea if a dependency is blocking the launch, though. I didn't see anything obvious in terms of cobbling together a path that would affect this, but that doesn't mean I didn't miss something.

jgriffin's comment 31 gives me pause. Are we trying to run on an unsupported platform?

If it is supported, and we're not seeing this problem elsewhere, do other harnesses launch:

/builds/slave/talos-slave/test/build/application/B2G.app/Contents/MacOS/b2g-bin ?

That's what the harness tried to launch per the logs.

Keep in mind I'm making an educated guess re: launcher, too. I'd feel better if jgriffin or someone else closer to Marionette validated it before we assume it's correct.
Flags: needinfo?(jgriffin)
Other tests invoke /builds/slave/talos-slave/test/build/application/B2G.app/Contents/MacOS/b2g-bin

This seems like something related to the Marionette JS stuff, so I'm passing the needinfo to James.
Flags: needinfo?(jgriffin) → needinfo?(jlal)
There are some scripts hard-coded their b2g path to B2G.app/Contents/MacOS/b2g-bin which are living in Gaia. I'm not clear and wonder if that is wrong if someone points to B2G.app/Contents/MacOS/b2g-bin?
nominating as blocking b2g since it makes testing gaia 2.2 impossible on desktop.
blocking-b2g: --- → 2.2?
(In reply to Carsten Book [:Tomcat] from comment #19)
> backedout since this might have caused test failures like
> https://treeherder.mozilla.org/logviewer.html#?job_id=4647788&repo=mozilla-
> inbound

Does it relate to bug 905324?
With 30 NI requests on jlal, is there anyone else who can unblock us with the test here? 

With the todays branch and FC approaching, this is pushing out an important l10n feature out of 2.2 because we can't land tests for it.
Alex/George/Ricky - Do any of you guys have cycles to own this issue? It's bad to be stalled on an older version, and since this is blocking people I think we need to get this resolved soon.

I know this isn't particular to gaia "build" code, but I generally think that people on the build team might have the best bet at solving this. Let me know if you can't get to it and we can start looking for other people to pull in.
Flags: needinfo?(ricky060709)
Flags: needinfo?(poirot.alex)
Flags: needinfo?(jlal)
Flags: needinfo?(gduan)
I think the patch was already completed but problem turned into Gip failures. Furthermore, IDK who is the best person able to answer comment 35.
Flags: needinfo?(ricky060709)
The problem seems to be why patch from comment 18 cause gaia-ui-test intermittent failure(not gij) only on mac which I'm not familiar with. I think jgriffin or jlal probably have idea on it.
Flags: needinfo?(gduan)
Hi jgriffin, would you mind to take a look at the error message posted on comment 19 again? It seems not a Gij but a Gip issue.
Flags: needinfo?(jgriffin)
I tried to rebase the patch and pushed to try again. I'm lost in all the various back and forth between everyone here. It would be great to have a fresh error log/build:
  https://treeherder.mozilla.org/#/jobs?repo=try&revision=97011202f714
And again, if that doesn't fail on try, I would like to know why we keep having failures that only happen when landing...
Flags: needinfo?(poirot.alex)
Trying to summarize:
There are two issues that I'm aware of:
1. Test failures only occur intermittently, but at increasing frequency after landing on inbound.
2. The failing tests fail for unknown reason. The tests are on B2G Desktop OSX opt[1]. Specifically:
   - Gaia Python Accessibility Integration Tests (a)
   - Gaia Python Functional Integration Tests (f1, f2, f3)
   - Gaia Python Integration Unit Tests (u)

Regarding 1: The patch did not cause failures when tested on try. It also landed fine originally, but over time, errors for B2G Desktop OSX opt started appearing at increasing frequency (see [1]). This is of concern since there is no good explanation for it. Do these tests use old bundles to run their tests? Does OSX launch an old .app bundle because it thinks it's the same as the one we intend to test? We may or may not be able to reproduce these failures on try by relaunching the same test jobs a bunch of times. I do suggest that we verify that the proper .app bundle is used to run these tests.

Regarding 2: Initial discussion and possible solutions were discussed in comment 28 and 29. Once we've eliminated the intermittency and someone comes forward who can run these tests locally, it should be fairly straightforward to test the possible solutions.


[1] https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=da9e07d0a41c
Ricky, unfortunately, those logs are over a month old and no longer exist, so I can't investigate.  The try run in comment #44 looks fine, though...

If we don't see a problem on try, but we do on regular branches, it may be due to clobbering.  Try jobs are always clobbered; inbound branches aren't.
Flags: needinfo?(jgriffin)
Perhaps we should land it again and see what's happen on inbound?
Flags: needinfo?(spohl.mozilla.bugs)
(In reply to Ricky Chien [:rickychien] from comment #47)
> Perhaps we should land it again and see what's happen on inbound?

This is not my call to make. If this is the route we want to go, I suggest that:
1. Someone experienced with these tests and/or B2G Desktop OSX takes ownership of this bug.
2. We let the sheriffs know ahead of time that this is landing again.
3. The owner of this bug closely watches the tree.
4. If okay with sheriffs, the owner debugs the issue on inbound should it arise again and collects any logs that are needed for investigating the issue should the patch be backed out again.
5. The patch is backed out if no solution can be found in a reasonable amount of time.

(In reply to Jonathan Griffin (:jgriffin) from comment #46)
> If we don't see a problem on try, but we do on regular branches, it may be
> due to clobbering.  Try jobs are always clobbered; inbound branches aren't.

This wouldn't explain the intermittent failures that appear more and more frequently though, would it? If it was a clobber issue, shouldn't the failures disappear over time, not the other way around?
Flags: needinfo?(spohl.mozilla.bugs)
I think it should be landed again, and monitored.  I'm happy to watch for errors and attempt to identify the problem, although I'm not too familiar with these tests.  But, it doesn't seem like a test problem anyway per se, since the error occurs on startup.

I'm PTO Jan 24-28, though, so I don't know if we should do this now or wait until Jan 29.
Thanks, Jonathan! If I recall correctly, the errors started appearing fairly quickly after landing. It just took a while to attribute them to this bug. Once you're ready I'm happy to refresh the patch for current trunk and push again, just let me know.
Fire away!
Posted patch Patch (obsolete) — Splinter Review
Updated for current trunk, carrying over r+.
Attachment #8529266 - Attachment is obsolete: true
Attachment #8552605 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/39c78d4281d5
Assignee: nobody → spohl.mozilla.bugs
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
I haven't seen any of the previous intermittent failures since landing on fx-team, so I have to assume that this got fixed along the way somewhere. Let's hope it sticks on trunk as well! Thanks, everyone!
This looks like it has resurfaced.

The only error produced is:

1421949954413	Marionette	INFO	marionette enabled via build flag and pref
[8139] ###!!! ABORT: LoadSheetSync failed with error 80040111 loading built-in stylesheet 'resource://gre-resources/counterstyles.css': file /builds/slave/fx-team-osx64_g-00000000000000/build/layout/style/nsLayoutStylesheetCache.cpp, line 378
[8139] ###!!! ABORT: LoadSheetSync failed with error 80040111 loading built-in stylesheet 'resource://gre-resources/counterstyles.css': file /builds/slave/fx-team-osx64_g-00000000000000/build/layout/style/nsLayoutStylesheetCache.cpp, line 378

This does not have anything to do with Marionette.  I suppose it could be possible that it has something to do with not clobbering the gaia build directory between builds, so I could try that to fix this, but I really have no idea what's going wrong here.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Also, I should reiterate this has nothing to do with any of the individual tests; the build is failing to launch, which may (or may not) have something to do with the profile (i.e. the gaia build) that's passed to it.
(In reply to Jonathan Griffin (:jgriffin) from comment #58)
> I suppose it could be
> possible that it has something to do with not clobbering the gaia build
> directory between builds, so I could try that to fix this, but I really have
> no idea what's going wrong here.

This is not the case; we actually do clobber the entire gaia directory (including the build directory) during each run already.
The test harness is still using B2G.app/Contents/MacOS/b2g, should it be using B2G.app/Contents/Resources/b2g-bin or something else instead?
Flags: needinfo?(spohl.mozilla.bugs)
(In reply to Jonathan Griffin (:jgriffin) from comment #61)
> The test harness is still using B2G.app/Contents/MacOS/b2g, should it be
> using B2G.app/Contents/Resources/b2g-bin or something else instead?

No, B2G.app/Contents/MacOS/b2g is correct since it's the main executable of the .app bundle, which should always be in Contents/MacOS.
Flags: needinfo?(spohl.mozilla.bugs)
(In reply to Jonathan Griffin (:jgriffin) from comment #58)
> The only error produced is:
> 
> 1421949954413	Marionette	INFO	marionette enabled via build flag and pref
> [8139] ###!!! ABORT: LoadSheetSync failed with error 80040111 loading
> built-in stylesheet 'resource://gre-resources/counterstyles.css': file
> /builds/slave/fx-team-osx64_g-00000000000000/build/layout/style/
> nsLayoutStylesheetCache.cpp, line 378
> [8139] ###!!! ABORT: LoadSheetSync failed with error 80040111 loading
> built-in stylesheet 'resource://gre-resources/counterstyles.css': file
> /builds/slave/fx-team-osx64_g-00000000000000/build/layout/style/
> nsLayoutStylesheetCache.cpp, line 378

Does this only occur with the patch here applied?
Could you point me to a run on treeherder that experienced these failures?
Is this still an intermittent failure, or permanent?
Flags: needinfo?(jgriffin)
Yes, this occurs with this patch, but intermittently, see e.g.,

https://treeherder.mozilla.org/#/jobs?repo=fx-team&revision=2dd88e466daf

You need to look at the gecko_log file to see this error, which for the above runs is at:

http://mozilla-releng-blobs.s3.amazonaws.com/blobs/fx-team/sha512/12b108ad9ff7628245d9f75d70144b3867eb6867e244a56f75ee9df707c89575f5fc661ef3a715d43469ccdbeb3d209c26522056492450a99a7d59b0d07911a4

This is looking like a problem with the build itself, and not with the test job.  Retriggers that occur where other jobs were green are also green, and retriggers where all jobs are red are also red.  So it appears that the build job is intermittently producing builds that cause this behavior.
Flags: needinfo?(jgriffin)
(In reply to Jonathan Griffin (:jgriffin) from comment #65)
> Does https://dxr.mozilla.org/mozilla-central/source/CLOBBER need to be
> touched?

I was going to say no, but at this point anybody's guess is as good as mine...

I diffed a good build with a bad one. There are no meaningful differences. Looking at tests and logs now...
I'm seeing the following difference in the log.

For the build that successfully runs the tests:
[...]
08:22:30     INFO - Calling ['/builds/slave/talos-slave/test/build/venv/bin/python', '-u', '/builds/slave/talos-slave/test/gaia/tests/python/gaia-ui-tests/gaiatest/cli.py', '--restart', '--timeout=10000', '--type=b2g', '--testvars=/builds/slave/talos-slave/test/gaia/tests/python/gaia-ui-tests/gaiatest/testvars.json', '--profile=/builds/slave/talos-slave/test/gaia/profile', '--symbols-path=https://ftp-ssl.mozilla.org/pub/mozilla.org/b2g/tinderbox-builds/fx-team-macosx64_gecko/1421940249/en-US/b2g-38.0a1.en-US.mac64.crashreporter-symbols.zip', '--gecko-log=/builds/slave/talos-slave/test/build/blobber_upload_dir', '--xml-output=/builds/slave/talos-slave/test/build/output.xml', '--html-output=/builds/slave/talos-slave/test/build/blobber_upload_dir/output.html', '--log-raw=/builds/slave/talos-slave/test/build/blobber_upload_dir/marionette_raw.log', '--binary=/builds/slave/talos-slave/test/build/application/B2G.app/Contents/MacOS/b2g-bin', '--address=localhost:2828', '--total-chunks=1', '--this-chunk=1', '/builds/slave/talos-slave/test/gaia/tests/python/gaia-ui-tests/gaiatest/tests/accessibility/manifest.ini'] with output_timeout 1000
08:22:31     INFO -  Results will not be posted to Treeherder. Please set the following environment variables to enable Treeherder reports: TREEHERDER_KEY, TREEHERDER_SECRET
08:22:35     INFO -  starting httpd
08:22:35     INFO -  running webserver on http://127.0.0.1:49228/
08:22:35     INFO -  mozversion Unable to find /builds/slave/talos-slave/test/build/application/B2G.app/Contents/MacOS/application.ini
08:22:35     INFO -  mozversion Unable to find /builds/slave/talos-slave/test/build/application/B2G.app/Contents/MacOS/platform.ini
08:22:35     INFO -  mozversion Error pulling gaia file
08:22:35     INFO -  SUITE-START | Running 26 tests
[...]

For the build with test failures:
[...]
10:05:38     INFO - Calling ['/builds/slave/talos-slave/test/build/venv/bin/python', '-u', '/builds/slave/talos-slave/test/gaia/tests/python/gaia-ui-tests/gaiatest/cli.py', '--restart', '--timeout=10000', '--type=b2g', '--testvars=/builds/slave/talos-slave/test/gaia/tests/python/gaia-ui-tests/gaiatest/testvars.json', '--profile=/builds/slave/talos-slave/test/gaia/profile', '--symbols-path=https://ftp-ssl.mozilla.org/pub/mozilla.org/b2g/tinderbox-builds/fx-team-macosx64_gecko/1421947268/en-US/b2g-38.0a1.en-US.mac64.crashreporter-symbols.zip', '--gecko-log=/builds/slave/talos-slave/test/build/blobber_upload_dir', '--xml-output=/builds/slave/talos-slave/test/build/output.xml', '--html-output=/builds/slave/talos-slave/test/build/blobber_upload_dir/output.html', '--log-raw=/builds/slave/talos-slave/test/build/blobber_upload_dir/marionette_raw.log', '--binary=/builds/slave/talos-slave/test/build/application/B2G.app/Contents/MacOS/b2g-bin', '--address=localhost:2828', '--total-chunks=1', '--this-chunk=1', '/builds/slave/talos-slave/test/gaia/tests/python/gaia-ui-tests/gaiatest/tests/accessibility/manifest.ini'] with output_timeout 1000
10:05:38     INFO -  Results will not be posted to Treeherder. Please set the following environment variables to enable Treeherder reports: TREEHERDER_KEY, TREEHERDER_SECRET
10:06:39     INFO -  Traceback (most recent call last):
10:06:39     INFO -    File "/builds/slave/talos-slave/test/gaia/tests/python/gaia-ui-tests/gaiatest/cli.py", line 4, in <module>
10:06:39     INFO -      main()
10:06:39     INFO -    File "/builds/slave/talos-slave/test/gaia/tests/python/gaia-ui-tests/gaiatest/runtests.py", line 111, in main
10:06:39     INFO -      cli(runner_class=GaiaTestRunner, parser_class=GaiaTestOptions)
10:06:39     INFO -    File "/builds/slave/talos-slave/test/build/venv/lib/python2.7/site-packages/marionette/runtests.py", line 35, in cli
10:06:39     INFO -      runner = startTestRunner(runner_class, options, tests)
10:06:39     INFO -    File "/builds/slave/talos-slave/test/build/venv/lib/python2.7/site-packages/marionette/runtests.py", line 20, in startTestRunner
10:06:39     INFO -      runner.run_tests(tests)
10:06:39     INFO -    File "/builds/slave/talos-slave/test/build/venv/lib/python2.7/site-packages/marionette/runner/base.py", line 746, in run_tests
10:06:39     INFO -      self.start_marionette()
10:06:39     INFO -    File "/builds/slave/talos-slave/test/build/venv/lib/python2.7/site-packages/marionette/runner/base.py", line 691, in start_marionette
10:06:39     INFO -      self.marionette = Marionette(**self._build_kwargs())
10:06:39     INFO -    File "/builds/slave/talos-slave/test/build/venv/lib/python2.7/site-packages/marionette/marionette.py", line 524, in __init__
10:06:39     INFO -      assert(self.wait_for_port()), "Timed out waiting for port!"
10:06:39     INFO -  AssertionError: Timed out waiting for port!
10:06:39     INFO -  Could not kill process, could not find pid: 4225, assuming it's already dead
10:06:39    ERROR - Return code: 1
10:06:39     INFO - gecko.log not found
10:06:39     INFO - TinderboxPrint: marionette: <em class="testfail">T-FAIL</em>
10:06:39    ERROR - Marionette exited with return code 1: harness failures
10:06:39    ERROR - # TBPL FAILURE #
10:06:39     INFO - Running post-action listener: _resource_record_post_action
10:06:39     INFO - Running post-run listener: _resource_record_post_run
10:06:40     INFO - Total resource usage - Wall time: 111s; CPU: 13.0%; Read bytes: 17793024; Write bytes: 1118330880; Read time: 11741; Write time: 718389
10:06:40     INFO - install - Wall time: 26s; CPU: 13.0%; Read bytes: 206735872; Write bytes: 459397120; Read time: 19978; Write time: 99807
10:06:40     INFO - run-marionette - Wall time: 86s; CPU: 13.0%; Read bytes: 12935168; Write bytes: 647444480; Read time: 10358; Write time: 618437
10:06:40     INFO - Running post-run listener: _upload_blobber_files
10:06:40     INFO - Blob upload gear active.
[...]


For the good build, there's obviously a problem as well since it's trying to find B2G.app/Contents/MacOS/application.ini and B2G.app/Contents/MacOS/platform.ini, which are now under Contents/Resources. I'm not sure why it's still able to run the tests while the bad build stops with an AssertionError...
So, it looks like we need to change the location where we look for application.ini and platform.ini in mozversion. And, we might want to check why httpd is able to run sometimes, but not at other times. Jonathan, do you agree?
Flags: needinfo?(jgriffin)
(In reply to Stephen Pohl [:spohl] from comment #68)
> So, it looks like we need to change the location where we look for
> application.ini and platform.ini in mozversion. And, we might want to check
> why httpd is able to run sometimes, but not at other times. Jonathan, do you
> agree?

No, both of those things are red herrings (although I agree we should fix the first).  The harness launches the binary before it launches httpd or uses mozversion.  In the passing case, we see those in the log, but in the failing case we don't, since we aren't launching the binary successfully.
Flags: needinfo?(jgriffin)
Backed this out in https://hg.mozilla.org/integration/mozilla-inbound/rev/d775f082aded

I can land a CLOBBER touch if we want to test that theory.
(In reply to Wes Kocher (:KWierso) from comment #70)
> Backed this out in
> https://hg.mozilla.org/integration/mozilla-inbound/rev/d775f082aded
> 
> I can land a CLOBBER touch if we want to test that theory.

I think it would be a worthwhile test.
(In reply to Jonathan Griffin (:jgriffin) from comment #71)
> I think it would be a worthwhile test.

https://hg.mozilla.org/integration/mozilla-inbound/rev/21300890038b
https://hg.mozilla.org/mozilla-central/rev/21300890038b
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
Target Milestone: --- → 2.2 S4 (23jan)
Re-opening, as the core commit has been backed out.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
What's the next step here?
Flags: needinfo?(spohl.mozilla.bugs)
:jgriffin suggested relanding with a clobber, but we're not sure if this will change anything. If someone could actually test with a machine that reproduces this issue, we might be able to pinpoint the problem a lot faster. But this requires a) access to an affected test machine and b) familiarity with b2g desktop osx and/or the affected tests, both of which I don't have...
Flags: needinfo?(spohl.mozilla.bugs)
Jonathan, did you want to try relanding this with a clobber-touch? I'm still not convinced that this will change anything, but I'm happy to try it.
Flags: needinfo?(jgriffin)
I'm not opposed to trying it, but I agree it doesn't seem likely to solve the problem.
Flags: needinfo?(jgriffin)
Wasn't a clobber attempted in comment 72/73 ?
The clobber in comment 72 happened after the patch was already backed out. We'd need to resubmit the patch along with a clobber to test this.
I continue to have low confidence in a clobber being the key here, and being responsible for tree closures isn't fun so I'll pass for now. Since I'm still unable to make any progress on this myself (see comment 76), I'm going to unassign myself for now. I'm happy to update the patch for current trunk again once there's movement here.
Assignee: spohl.mozilla.bugs → nobody
Thanks Stephen!

I'll try to keep making sure that there's someone actively trying to figure out what's going on.

on IRC, James said he want to take a look at this, so setting a NI on him :)
Flags: needinfo?(jlal)
triage:
blocking per comment 37.
changing component to a more suitable one.
blocking-b2g: 2.2? → 2.2+
Component: Emulator → Gaia::Build
The core problem here is we are using an older b2g-desktop to run the build system and from what I can see on OSX the newer builds are missing some bits (or do not allow us to access them) I will continue to debug here but anyone who is willing to pitch in would be great...

The "easy" solution is to figure out why the deps are missing and fix them there is a longer term issue where I still think (and this is an old argument) that xpcshell/b2g-desktop based build system has some architectural challenges ... We could forfill the needs of the l10n preprocess step now in various other ways (ServiceWorker / Using an app to spit out this data0
Flags: needinfo?(jlal)
This is what the patch is trying to do (see the commit in comment 56) but something is still missing.
I am in the process of updating the version of xpcshell we use for building gaia in bug 1128643...maybe that will help?
(In reply to Zibi Braniecki [:gandalf] from comment #37)
> nominating as blocking b2g since it makes testing gaia 2.2 impossible on
> desktop.

AFAK, most of tests on tbpl are running with b2g37 for 2.2, what make 2.2 gaia testing impossible ?
Flags: needinfo?(gandalf)
(In reply to George Duan [:gduan] [:喬智] from comment #87)
> (In reply to Zibi Braniecki [:gandalf] from comment #37)
> > nominating as blocking b2g since it makes testing gaia 2.2 impossible on
> > desktop.
> 
> AFAK, most of tests on tbpl are running with b2g37 for 2.2, what make 2.2
> gaia testing impossible ?

You can't locally run Gip tests or Intergration tests on Mac OS because they try to use B2G 34 [0].

Those tests started relying on code that has been introduced to Gecko after 34, either in JS [1] or CSS [2].


[0] https://github.com/mozilla-b2g/gaia/blob/53d81870dc547fa57b915b06457142e4ee1051b2/Makefile#L295-L296
[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1089710#c21
[2] https://groups.google.com/d/msg/mozilla.dev.gaia/TzOxy_VuE48/om9KiUnc-5kJ
Flags: needinfo?(gandalf)
In fact, there are two b2g in gaia (b2g and b2g_sdk), one is B2G34 for building gaia, another is B2G38 (now) for running Gij tests. When launching Gij tests, we try to download newer B2G through mozilla-download [1]. However, I'm not familiar with Gip tests, so I'm not sure mozilla-download is running for Gip too.

[1] https://github.com/RickyChien/gaia/blob/5023cac43c6dc290424bc0ea16fa41de7f4aaa03/Makefile#L753-L758
Flags: needinfo?(gandalf)
After tracing log from [1], it seems that Gip reads b2g in "--binary=/builds/slave/test/build/application/b2g/b2g-bin". We could upgrade this b2g since it's different with our b2g used for gaia build system.
So I think it shouldn't be a cause to block gaia v2.2 testing.


[1] http://ftp.mozilla.org/pub/mozilla.org/b2g/try-builds/gaiabld-79bbfcf4d1cf/gaia-try-linux64_gecko/gaia-try_ubuntu64_vm-b2gdt_test-gaia-ui-test-functional-1-bm52-tests1-linux64-build894.txt.gz
Set 2.2? again because I think it shouldn't be a blocker due to comment 90 and comment 92.
blocking-b2g: 2.2+ → 2.2?
Flags: needinfo?(gandalf)
We really should not require two different b2g builds for gaia =(
(In reply to Ricky Chien [:rickychien] from comment #93)
> Set 2.2? again because I think it shouldn't be a blocker due to comment 90
> and comment 92.

I believe it should be a blocker.

Please, try launching `make test-integration APP=system` on mac right now.

It will not work because we rely on b2g_sdk[0] which is right now on version 34 [1] while the code relies on features added post b2g 34. So we're stuck.

I don't believe we should release a platform that requires obsolete and unsupported gecko [2] for testing and running a platform that already requires newer one.

[0] https://github.com/mozilla-b2g/gaia/blob/4730b2272822062cc4e0c658a2d64520febf3449/bin/gaia-marionette#L104
[1] https://github.com/mozilla-b2g/gaia/blob/4730b2272822062cc4e0c658a2d64520febf3449/Makefile#L301-L302
[2] just search for "b2g_sdk" in Makefile
Since `make test-integration` will start building profile, we could try to run test without building profile by `make test-integration-test` [1]. What I saw it is running on gaia/b2g and doesn't throw any error around b2g_sdk or try to build b2g_sdk even though I've removed my b2g_sdk.


[1] https://github.com/RickyChien/gaia#running-tests-without-building-profile
I haven't verify yet, it printed error when launching marionette.

dyld: lazy symbol binding failed: Symbol not found: _node_module_register
...

However, I think it's weird that we download new b2g from [1] and use b2g_sdk ?

[1] https://github.com/mozilla-b2g/gaia/blob/4730b2272822062cc4e0c658a2d64520febf3449/bin/gaia-marionette#L52-L55
[2] https://github.com/mozilla-b2g/gaia/blob/4730b2272822062cc4e0c658a2d64520febf3449/bin/gaia-marionette#L104
My marionette has installed successfully after `rm -rf node_modules && npm install` again. My marionette  launched properly with removing b2g_sdk/ folder. If you try to remove b2g/, marionette will begin to download newer b2g through mozilla-download. 

As a result, I'm sure that won't be a blocker. =)
I'm sorry Ricky but I fail to understand your reasoning. We have bugs that can't be fixed because of this bug, we have bg34 referenced in our Makefile and you claim it's not a blocker? Are you saying we can remove b2g_sdk and Makefile instructions for it?
What I claimed is this bug will block gaia 2.2 testing as you said on comment 37. Therefore, I tried to prove it and it seems that marionette is running on b2g38 currently.
Maybe to be clearer: 
* b2g_34 is used only for building.
* latest b2g is used to run tests

I'd really want to see this bug fixed (I want to use some new syntax!). But as Ricky I don't really understand why it should be a blocker.
B2G triage is not blocking based on Ricky and Julien's comments above.
blocking-b2g: 2.2? → ---
Ok. So what can we do to unblock ourselves from being stuck with b2g34 for build? We have another bug (bug 1137094) where we would love to be able to use <template/>.
I requested uplift to b2g34 of bug 1084502 which is the bug that makes it impossible to use <template> when building Gaia.
Stas, if I understand you correctly, the problem is that we use l10n.js during the build process run with b2g34, and you want to use <template> in l10n.js. Is that so?


This is maybe a very dumb idea, but can we surround the failing Cu.import with a try/catch block, and in the catch clause use the Cu.import that works on Mac ? We can file a separate bug to do this as a workaround, and keep this bug to fix the root cause.

We have other places where we do this: [1].

[1] https://github.com/mozilla-b2g/gaia/blob/16e0aca16667d94ed9fed5ae7eed3a14da0d7986/tools/extensions/httpd/bootstrap.js#L15-L21
Also I see that jgriffin had similar issues when updating xpcshell in bug 1128643 but it was resolved in a later version.

I'm a bit lost because I don't think we use a separate xpcshell now, but we use the one in b2g_sdk, so I don't really get how bug 1128643 fits in.

I'll try to look at this today, some fresh eyes might help.
(In reply to Julien Wajsberg [:julienw] from comment #105)
> Stas, if I understand you correctly, the problem is that we use l10n.js
> during the build process run with b2g34, and you want to use <template> in
> l10n.js. Is that so?

Yes, this is precisely the problem I'm facing.

Thanks for taking a look, Julien!
Here is a new try so that I can have a B2G on the Mac.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=bb05710ead23
The try looks good, but as I understand the previous comments, the problem is intermittent with the _build_ itself, so there is no point in restarting the test builds on the try.

I'll unfortunately need to install Xcode on the Mac and build on it so that I can try repeatidly. I'm not really happy :)

Note that on treeherder, here is what happens:
* we use the generated B2G to run the test.
* we use a fixed B2G to run the build itself.
=> That means that we'll need to ask releng guys to update the fixed B2G (maybe they can use the generated one on the B2G builds?) after this bug is fixed anyway.
=> I don't think this is what's causing the trouble here though.
My opinion is that we don't _really_ care about the suites in MacOS X and we can disable them if it turns out they're too flaky, and work separately to reenable them.

I'm more concerned with updating B2G to a newer version for the build system than have B2G sometimes (!) not working.

I'll look at the build in [1] locally before requesting a new checkin.

[1] https://treeherder.mozilla.org/#/jobs?repo=fx-team&revision=2dd88e466daf
Too bad, the build from previous builds is not available anymore.
Posted patch rebased patch (obsolete) — Splinter Review
carry over r+
Attachment #8570401 - Flags: review+
Posted patch rebased patch v2 (obsolete) — Splinter Review
Attachment #8570401 - Attachment is obsolete: true
Attachment #8570434 - Flags: review+
Attachment #8570434 - Attachment is patch: true
Let's land this, again.

Sheriffs, in case of an obvious issue in MacOS X builds (all Gip tests are failing at once for a specific build) you can back out. Then I'll have a failing build I'll possibly be able to debug.
Keywords: checkin-needed
Hey Ryan, is there any reason the checkin-needed request is not fulfilled here?
Flags: needinfo?(ryanvm)
Because we've been too busy firefighting the last few days (dating back to last week) to handle most checkin-neededs, let alone ones that have been known to cause random mass bustage on multiple previous landing attempts.
Flags: needinfo?(ryanvm)
Attachment #8552605 - Attachment is obsolete: true
Fun story, Gip got disabled on OSX across all branches yesterday.
But now b2g/installer/package-manifest.in is bitrotted.
Keywords: checkin-needed
Thanks for letting us know! I'll update to current trunk today.
Assignee: nobody → spohl.mozilla.bugs
Thanks Stephen ! I already did it once so ping me if you happen to be busy with other things :)
Posted patch PatchSplinter Review
Updated for current trunk. Carrying over r+.
Attachment #8570434 - Attachment is obsolete: true
Attachment #8572654 - Flags: review+
Keywords: checkin-needed
(In reply to Julien Wajsberg [:julienw] from comment #120)
> Thanks Stephen ! I already did it once so ping me if you happen to be busy
> with other things :)

Hey Julien, I totally missed this. But I appreciate the offer to help! Let's keep our fingers crossed that this sticks this time.
https://hg.mozilla.org/mozilla-central/rev/e5c18a7a6c98
Status: REOPENED → RESOLVED
Closed: 5 years ago4 years ago
Resolution: --- → FIXED
Depends on: 1142605
You need to log in before you can comment on or make changes to this bug.