Make the B2G Flame builds meet the job visibility requirements

RESOLVED WONTFIX

Status

Firefox OS
General
RESOLVED WONTFIX
4 years ago
2 years ago

People

(Reporter: emorley, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

4 years ago
From bug 1017525:

> The Flame builds are only scheduled on a periodic basis, and self-serve
> won't allow me to retrigger ones inbetween (no job present with matching
> name, when trying the usual trick).
> 
> As such the regression range is quite large:
> https://tbpl.mozilla.org/?tree=B2g-
> Inbound&jobname=flame&tochange=cff36dd6e717&fromchange=47876b558703
> 
> The lack of dep builds, the inability to easily retrigger, the unhelpfulness
> of the error message and the usual B2G obscurity due to multiple repos means
> the Flame builds do not meet
> https://wiki.mozilla.org/Sheriffing/Job_Visibility_Policy and so will be
> hidden until they do.

This meta bug is for fixing the deficiencies with the automation of these builds. I'll file a TBPL counterpart dependant on this bug, where the unhiding will take place.
(Reporter)

Updated

4 years ago
Blocks: 1017529
You should be able to trigger intermediate builds with this:

https://wiki.mozilla.org/ReleaseEngineering/How_To/Trigger_arbitrary_jobs
(Reporter)

Comment 2

4 years ago
I'm immensely grateful for the work that's been done so far in triggering arbitary builds - and think it's both going to be invaluable for sheriffs in the future & the way forwards in general. However, until it's finished up and integrated with buildapi / similar, it's above the threshold for what we can expect non-sheriffs to do, and so doesn't meet the visibility requirements.
(Reporter)

Comment 3

4 years ago
Also as Ryan reminded me on IRC, jobs triggered using that script do not appear in TBPL until they complete, which makes it hard to keep track of which revisions are under bisection.
(Reporter)

Comment 4

4 years ago
In response to bug 1016940 comment 5 (asking if fixes could be tested locally before pushing):

15:51 <gerard-majax_> edmorley|sheriffduty, I don't want to be harsh
15:51 <gerard-majax_> edmorley|sheriffduty, but it's highly painful to fix the flame
15:52 <gerard-majax_> edmorley|sheriffduty, because it means having two completely separate b2g tree to be 100% sure
15:52 <gerard-majax_> edmorley|sheriffduty, and I have no idea whether we can do try on this kind of things

Seems like we need to:
* Make sure we can run these on try, if not already
* Document how to run on try (especially for manifest changes)
* Make sure developers have an easier way to build locally
* Integrate arbitrary build retriggers into buildapi
* Ensure that arbitrary builds appear on TBPL when in the pending & running states
(Reporter)

Updated

4 years ago
Depends on: 1001484
(In reply to Ed Morley [:edmorley UTC+0] from comment #3)
> Also as Ryan reminded me on IRC, jobs triggered using that script do not
> appear in TBPL until they complete, which makes it hard to keep track of
> which revisions are under bisection.

This will be fixed by bug 1009565
Depends on: 1009565

Comment 6

4 years ago
(In reply to Ed Morley [:edmorley UTC+0] from comment #4)
> In response to bug 1016940 comment 5 (asking if fixes could be tested
> locally before pushing):
> 
> 15:51 <gerard-majax_> edmorley|sheriffduty, I don't want to be harsh
> 15:51 <gerard-majax_> edmorley|sheriffduty, but it's highly painful to fix
> the flame
> 15:52 <gerard-majax_> edmorley|sheriffduty, because it means having two
> completely separate b2g tree to be 100% sure
> 15:52 <gerard-majax_> edmorley|sheriffduty, and I have no idea whether we
> can do try on this kind of things
> 
> Seems like we need to:
> * Make sure we can run these on try, if not already
> * Document how to run on try (especially for manifest changes)
> * Make sure developers have an easier way to build locally
> * Integrate arbitrary build retriggers into buildapi
> * Ensure that arbitrary builds appear on TBPL when in the pending & running
> states

They won't be available on try, at least not in the short term. No device build has ever been available on try due to the fact that we can't legally distribute the device builds, and flame is no different. We are considering work arounds to this, but that cannot block this from being visible like all the other device builds.

I'm curious why it is only triggered periodically and not on every push.
I'll get the ball rolling on that. I filed bug 1019578 to run Flame builds per-push.
(Reporter)

Comment 8

4 years ago
(In reply to Clint Talbert ( :ctalbert ) from comment #6)
> They won't be available on try, at least not in the short term. No device
> build has ever been available on try due to the fact that we can't legally
> distribute the device builds, and flame is no different. We are considering
> work arounds to this, but that cannot block this from being visible like all
> the other device builds.

At the moment we're in a bit of a situation - we neither have the ability to push to try (either by resolving the legal issues or by having certain try job types that doesn't upload builds to a public location) nor do we have a local development workflow that devs are willing to subject themselves to. 

If the flame is as important a device as we're making out (and I agree it is), we should be resolving the local dev workflow issue at least as a stopgap to a semi-private-try setup (in addition to finding the capacity to fix bug 1019578).

To me this just seems like yet another example of putting our money where our mouth is: If we believe this device is important to us, then we must devote the resources ({human,cpu cycles}) to it.
Depends on: 1019578

Comment 9

4 years ago
No arguments, Ed. I agree with you.

As I understand it, the developer work flow will be as follows: 
1. Get the base build from our partner (T2M)
2. Download gaia + gecko from FTP
3. Flash to phone

Once you have a base image - you can then do builds and flash to your heart's content. 

If you're an internal developer with access to the PVT builds, then you can just use the Taipei b2g_flash_tool [1]. Which does it all for you in one step.

The thing we can't do is put these builds on our normal, level-1 access try and that is something we can't change, sadly, no matter what we invest in it. It's legal issues that we can't win.

[1] https://github.com/Mozilla-TWQA/B2G-flash-tool
(Reporter)

Comment 10

4 years ago
(In reply to Clint Talbert ( :ctalbert ) from comment #9)
> The thing we can't do is put these builds on our normal, level-1 access try
> and that is something we can't change, sadly, no matter what we invest in
> it. It's legal issues that we can't win.

Indeed, but what we can invest in is this:

(In reply to Ed Morley [:edmorley UTC+0] from comment #8)
> or by having certain try job types that doesn't upload builds to a public location

And in fact, this is what we already do for our mozilla-central B2G device image builds (Flame included) and also our Tryserver B2G emulator builds, eg the "go to build location" link on TBPL links to:
http://ftp.mozilla.org/pub/mozilla.org/b2g/tinderbox-builds/mozilla-central-flame_eng/1401813003/
http://ftp.mozilla.org/pub/mozilla.org/b2g/try-builds/mtseng@mozilla.com-45af4d4b2a1e/try-emulator-jb/

...where only the log file has been uploaded, and not the binary.

Given that, I don't see why we can't run them on Try with a few patches from releng?

Comment 11

4 years ago
This discussion already happened in bug 1001486 and followup email thread.
(Reporter)

Comment 12

4 years ago
(In reply to Aki Sasaki [:aki] from comment #11)
> This discussion already happened in bug 1001486 and followup email thread.

A combination of disabling uploads for anything but the log for jobs of type X, and fixing bug 617414 once and for all (something we should have done long ago) would be surely be enough to mitigate the concerns there surely? Or are we really suggesting that people are going to try and cat things to the logs?
(Reporter)

Comment 13

4 years ago
Or even if needs be: upload everything including the log to pvt.builds.* - with the only publicly accessible info being the buildbot job properties (eg result, machine used, start/end time, log url that points to a ldap protected location).

Comment 14

4 years ago
What you're proposing here Ed is the classic cyber security problem of wall building. It can be done successfully if you can ensure control of all inputs and outputs which isn't something we can do with 100% confidence on try as it exists today. The lawyers see this as less of an engineering problem and more one of mitigating the risk. We could spend lots of time and energy building the perfect wall, and all it takes is one hole in it to break contracts and end our entire Firefox OS initiative. That's how legal sees the stakes here.

So, even if we build the wall, it's unlikely we'll ever pass security muster. And if we build the wall, is it even useful to the developers trying to use these builds? 

It's unfortunate that we didn't get a reference phone with an entirely open stack. But it's our first run at this out of the gate, and we will do another reference phone and I intend for us to learn from many of these mistakes. The best way forward here is really to work toward making these things more sheriffable with UI automation tests running in emulators and a sanity test running per push on real devices.
(In reply to Clint Talbert ( :ctalbert ) from comment #14)
> What you're proposing here Ed is the classic cyber security problem of wall
> building. It can be done successfully if you can ensure control of all
> inputs and outputs which isn't something we can do with 100% confidence on
> try as it exists today. The lawyers see this as less of an engineering
> problem and more one of mitigating the risk. We could spend lots of time and
> energy building the perfect wall, and all it takes is one hole in it to
> break contracts and end our entire Firefox OS initiative. That's how legal
> sees the stakes here.

Are we concerned about protecting ourselves from bad actors and/or mistakes? Or is this just about perception?

We're already in a situation where a bad or irresponsible actor can build and upload Flame builds publicly. Try machines have access to all the necessary bits. Try jobs execute arbitrary code. They can do whatever the heck they want. Not having Flame builds on try does not protect us against bad actors, regardless of what they upload by default.

If we're concerned about, for example, someone accidentally committing a change that puts bits somewhere we don't want them, that is also unlikely. Most upload information is stored in Mozharness, not in Gecko. If we do as Ed suggests and only upload logs publicly, someone would purposely have to put the names of files containing secrets into https://mxr.mozilla.org/mozilla-central/source/b2g/config/flame/config.json#13 for bad things to happen.

With all of that said, I'm getting the impression that the sticking point here is the fact that we're talking about something that says "Flame" on it - not real technical or security reasons. If that's the call, then that's the call - but I think it sucks that we're hurting developer productivity for that.
(Reporter)

Comment 16

4 years ago
(In reply to Clint Talbert ( :ctalbert ) from comment #14)
> What you're proposing here Ed is the classic cyber security problem of wall
> building. It can be done successfully if you can ensure control of all
> inputs and outputs which isn't something we can do with 100% confidence on
> try as it exists today. The lawyers see this as less of an engineering
> problem and more one of mitigating the risk. We could spend lots of time and
> energy building the perfect wall, and all it takes is one hole in it to
> break contracts and end our entire Firefox OS initiative. That's how legal
> sees the stakes here.
> 
> So, even if we build the wall, it's unlikely we'll ever pass security
> muster. And if we build the wall, is it even useful to the developers trying
> to use these builds? 

I think this is a slight misunderstanding of the current situation:

1) We already have multiple risks & have already "built a wall":
  ** Users with level 3 access can modify buildbotcustom and/or a trunk repository that does run Flame builds.
  ** Try already has access to the private source repos (aiui, given comment 15), so whether we run the flame builds or not - a bad actor can still upload source somewhere.

2) Whilst I agree, we don't control _all_ the inputs and outputs:
  ** Many of the inputs are in level 3 repos (buildbotcustom, buildbot-configs, mozharness, tools) - and given #1 we must think that level 3 is sufficient protection.
  ** For the one that isn't (try's gecko checkout), given level 1 people can already access private source on non-flame builds should they wish (see #1 / comment 15), then this doesn't really change anything.
  ** A combination of fixing bug 617414 at the hardware firewall level (which can't be affected be a checkin) & limiting all uploads in buildbotcustom (per comment 12 / comment 13) means the inputs a low to medium skill level 1 bad actor has control of wouldn't be sufficient to cause a problem. And for a highly skilled bad actor, #1 already means they can do what they want regardless, even without us running Flame builds on try.

3) "And if we build the wall, is it even useful to the developers trying to use these builds?
  ** Yes - because per comment 10 to comment 13, we'd still have the job result available on TBPL - and iff the build fails, the dev can use the "go to directory" link on TBPL to visit the pvt.builds.* location behind LDAP. I imagine most/all people working on Flame device changes that are pushing to try, will be employees with access to that location.

-> Given that we already have a "can-never-be-100%-perfect wall" and that running Flame builds doesn't change what both low and high skill bad actors can already achieve - our contracts are no more at risk than they already are at present.

However as Ben said, it seems like perhaps the above is all moot, because we're talking about "Flame".

> The best way forward here is really to work toward making these
> things more sheriffable with UI automation tests running in emulators and a
> sanity test running per push on real devices.

I don't see how this will help unfortunately - if we can't build on Try, then we can't run tests on Try either. If developers says it's too much of a pain to build Flame locally, then adding automated tests doesn't help with that (fixing bootstrap scripts, build times, wiki docs & education will). And running tests only on trunk still means we'll find most of the failures only after someone lands, not before. Combined with us (at least initially) only having 30 real devices running tests, I think the size of the regression ranges is going to be pretty painful.
(Reporter)

Comment 17

4 years ago
This are now failing on aurora; hidden for parity with m-c:
https://tbpl.mozilla.org/?tree=Mozilla-Aurora&jobname=flame&showall=1
(Reporter)

Comment 18

4 years ago
(In reply to Ed Morley [:edmorley UTC+0] from comment #17)
> This are now failing on aurora; hidden for parity with m-c:
> https://tbpl.mozilla.org/?tree=Mozilla-Aurora&jobname=flame&showall=1

Failure:
https://tbpl.mozilla.org/php/getParsedLog.php?id=41423145&tree=Mozilla-Aurora

This would have been avoided by these:
Bug 910745 - Third party repositories listed in b2g-manifest should always reference a tag/revision
Bug 1012618 - B2G release builds should not pull master of external repositories
(Reporter)

Comment 19

4 years ago
Since I was asked on IRC - to clarify:
Flame builds are hidden in the default view on TBPL; but you can still see them by appending &showall=1 to the URL. Optionally also use TBPL's filter (either through the UI, or by appending &jobname=flame) to show just the jobs of interest.
(Reporter)

Updated

4 years ago
Depends on: 1023857
(Reporter)

Updated

2 years ago
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.