Adjust hazard builds for s3 migration

RESOLVED FIXED

Status

RESOLVED FIXED
3 years ago
6 months ago

People

(Reporter: nthomas, Assigned: sfink)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 2 obsolete attachments)

(Reporter)

Description

3 years ago
linux64-br-haz_mozilla-central_dep is doing an ssh mkdir, and rsync. eg:

17:53:30     INFO - #####
17:53:30     INFO - ##### Running upload-analysis step.
17:53:30     INFO - #####
17:53:30     INFO - Running main action method: upload_analysis
17:53:30     INFO - Uploading the contents of /builds/slave/l64-br-haz_m-cen_dep-000000000/build/upload to stage.mozilla.org:/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux64-br-haz/20151003155036
17:53:30     INFO - Running command: ['mock_mozilla', '-r', u'mozilla-centos6-x86_64', '-q', '--cwd', '/builds/slave/l64-br-haz_m-cen_dep-000000000/build', '--unpriv', '--shell', u'ssh -oIdentityFile=/home/mock_mozilla/.ssh/ffxbld_rsa ffxbld@stage.mozilla.org mkdir -p /pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux64-br-haz/20151003155036'] in /builds/slave/l64-br-haz_m-cen_dep-000000000/build
17:53:30     INFO - Copy/paste: mock_mozilla -r mozilla-centos6-x86_64 -q --cwd /builds/slave/l64-br-haz_m-cen_dep-000000000/build --unpriv --shell "ssh -oIdentityFile=/home/mock_mozilla/.ssh/ffxbld_rsa ffxbld@stage.mozilla.org mkdir -p /pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux64-br-haz/20151003155036"
17:53:31     INFO - Return code: 0
17:53:31     INFO - Running command: ['mock_mozilla', '-r', u'mozilla-centos6-x86_64', '-q', '--cwd', '/builds/slave/l64-br-haz_m-cen_dep-000000000/build/upload', '--unpriv', '--shell', u'rsync -e "ssh -oIdentityFile=/home/mock_mozilla/.ssh/ffxbld_rsa" -azv . ffxbld@stage.mozilla.org:/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux64-br-haz/20151003155036/'] in /builds/slave/l64-br-haz_m-cen_dep-000000000/build/upload
17:53:31     INFO - Copy/paste: mock_mozilla -r mozilla-centos6-x86_64 -q --cwd /builds/slave/l64-br-haz_m-cen_dep-000000000/build/upload --unpriv --shell "rsync -e \"ssh -oIdentityFile=/home/mock_mozilla/.ssh/ffxbld_rsa\" -azv . ffxbld@stage.mozilla.org:/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux64-br-haz/20151003155036/"
17:53:32     INFO -  sending incremental file list
17:53:32     INFO -  ./
17:53:32     INFO -  allFunctions.txt.gz
17:53:33     INFO -  gcFunctions.txt.gz
17:53:33     INFO -  gcTypes.txt.gz
17:53:33     INFO -  hazards.txt.gz
17:53:33     INFO -  refs.txt.gz
17:53:33     INFO -  rootingHazards.txt.gz
17:53:33     INFO -  unnecessary.txt.gz
17:53:33     INFO -  sent 20633649 bytes  received 148 bytes  8253518.80 bytes/sec
17:53:33     INFO -  total size is 20626210  speedup is 1.00
17:53:33     INFO - Return code: 0
17:53:33     INFO - TinderboxPrint: upload <a title='hazards_results' href='https://ftp-ssl.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux64-br-haz/20151003155036'>results</a>: complete
1

We'll need to convert to post_upload (and lose the human friendly dates), or use s3 uploads directly.
(Reporter)

Updated

3 years ago
Blocks: 1186297
(Reporter)

Comment 1

3 years ago
Or just move them to taskcluster.

sfink, the issue is that we're moving the ftp server to S3. There will be an upload host that looks like stage.m.o, so that you can scp files up to /tmp/<tmpdir>, then run post_upload to move them to S3. There won't be any mount/fuse/whatever of S3 to manipulate the files from upload host though. So you could
* convert the hazard builds to use post_upload, and accept the change from YYYYMMDD... style dirs to epoch
* move the builds to taskcluster

What do you think ?
Flags: needinfo?(sphink)
(Assignee)

Comment 2

3 years ago
(In reply to Nick Thomas [:nthomas] from comment #1)
> Or just move them to taskcluster.
> 
> sfink, the issue is that we're moving the ftp server to S3. There will be an
> upload host that looks like stage.m.o, so that you can scp files up to
> /tmp/<tmpdir>, then run post_upload to move them to S3. There won't be any
> mount/fuse/whatever of S3 to manipulate the files from upload host though.
> So you could
> * convert the hazard builds to use post_upload, and accept the change from
> YYYYMMDD... style dirs to epoch

That would be the only visible change? I see no problems with that at all. There will still be a directory listing all of the files uploaded for that build (and no others)?

> * move the builds to taskcluster

I personally don't know enough about taskcluster to do this. And it would probably have to be done in parts, because there are hazard builds for both the browser and b2g. IIUC, the regular browser build has a TC implementation for it now, but the b2g build does not.

I have wasted a *lot* of time recently because my machine can no longer do mock_mozilla stuff, so I would love to have a docker-based setup for working on b2g. But I know next to nothing about docker or taskcluster, and I really would not want be the one to port b2g. Dealing with b2g makes me want to saw off my toes with a blunt knife and soak my toes in a salt and vinegar solution.
Flags: needinfo?(sphink)
(Reporter)

Comment 3

3 years ago
Sorry, I missed this bug and we're migrating in a few hours (Tuesday afternoon Pacific). 

(In reply to Steve Fink [:sfink, :s:] from comment #2)
> (In reply to Nick Thomas [:nthomas] from comment #1)
> > So you could
> > * convert the hazard builds to use post_upload, and accept the change from
> > YYYYMMDD... style dirs to epoch
> 
> That would be the only visible change? I see no problems with that at all.
> There will still be a directory listing all of the files uploaded for that
> build (and no others)?

Yes, I think so. It would be the same as the firefox tinderbox-builds. Would also resolve the builds going into 2xxxxxx style dirs and logs into epoch-style.

https://dxr.mozilla.org/mozilla-central/source/toolkit/mozapps/installer/packager.mk#211 might provide an easy way to do this. Looks like you wouldn't want base path or package args, and can fake out the properties file one, then list the files you want to upload. Mozharness has a helper function (_query_post_upload_cmd) to generate, or interpolate like configs/single_locale/production.py does.

Updated

3 years ago
Flags: needinfo?(sphink)
sfink, these changes are now live, so hazard builds are failing to upload, causing each one to show as a failure. 

We have two options (until a fix gets deployed one way or the other):
1. Hide the jobs and let the trees get reopened and go on, business as usual (but without any hazard build coverage) until some fix gets deployed.
2. We close the trees until some fix gets deployed.

Which would you choose?
(Reporter)

Comment 5

3 years ago
We are currently in state 2 - the integration trees are closed. Is that really the best solution here ?
(Assignee)

Comment 6

3 years ago
(In reply to Wes Kocher (:KWierso) from comment #4)
> sfink, these changes are now live, so hazard builds are failing to upload,
> causing each one to show as a failure. 
> 
> We have two options (until a fix gets deployed one way or the other):
> 1. Hide the jobs and let the trees get reopened and go on, business as usual
> (but without any hazard build coverage) until some fix gets deployed.
> 2. We close the trees until some fix gets deployed.
> 
> Which would you choose?

Hazard builds are tier 1, and rightfully so, so I would say that #2 is the right default. And I would further say that there's no huge risk in being without hazard coverage for a little while, so I'd vote for #1. But I'll try for a #1.5, which is to disable the upload step for now. It'll regain coverage, but any failures will be very difficult to diagnose.
Flags: needinfo?(sphink)
(Assignee)

Updated

3 years ago
Keywords: leave-open
Duplicate of this bug: 1217814
(Assignee)

Comment 14

3 years ago
This turned out to be a major nuisance when Dexter ran into a tricky hazard that I couldn't spot from code inspection (it was a case where a destructor ran while a GC pointer return value was on the stack, and the analysis thought the destructor could GC). So I just pushed, without discussion nor review, a change to at least dump hazards.txt into the log for now.

Having that output in a try push made the above hazard immediately obvious.

The output isn't pretty, as it can get interleaved with other log output. But it makes the current state of affairs way more tolerable.
(Assignee)

Comment 16

3 years ago
Created attachment 8680400 [details] [diff] [review]
Enable MOZ_UPLOAD_DIR on hazard builds and switch to blobber for uploading artifacts

jlund - I'm not sure if you're the right one for this. The code change is relatively simple -- it makes both hazard builds use BlobUploaderMixin. This is for two different reasons: first, so people can throw things into MOZ_UPLOAD_DIR for custom try pushes. I don't expect this part to be controversial at all.

Second, hazard builds used to copy their output files to the upload server. That broke with the switchover to S3, and the replacement functionality looked very difficult to use from these jobs. So I switched them to the blobber server instead. That means echo of the automation builds will be sending up ~10MB to the blobber; I have no idea whether that's reasonable or not.

Please redirect the review if appropriate.
Attachment #8680400 - Flags: review?(jlund)
(Assignee)

Updated

3 years ago
Assignee: nobody → sphink
Status: NEW → ASSIGNED
(Reporter)

Comment 17

3 years ago
Here's another option - upload the bits to taskcluster similar to the same way firefox/fennec builds in mozharness do. See https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/building/buildbase.py#1498 for the code. It's a little different because that's relying on 'make upload' to provide some properties, but if you have a fixed list of files to upload that could be easily put in the configs.
(Reporter)

Comment 18

3 years ago
The idea with that is creating fake taskcluster tasks and attaching artifacts to them, including indexes to querying. It's a kind of first step to taskcluster.
(Assignee)

Comment 19

3 years ago
(In reply to Nick Thomas [:nthomas] from comment #17)
> Here's another option - upload the bits to taskcluster similar to the same
> way firefox/fennec builds in mozharness do. See
> https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/
> mozilla/building/buildbase.py#1498 for the code. It's a little different
> because that's relying on 'make upload' to provide some properties, but if
> you have a fixed list of files to upload that could be easily put in the
> configs.

Oh, that does indeed look promising. Way simpler than what I was looking at before.

I think I'll split up the patch. I want to support $MOZ_UPLOAD_DIR anyway, so I'll do that separately. Then I'll have the trivial patch that switches the artifacts over to blobber as a separate piece that we could land for now but easily revert or replace with TC uploads.
(Assignee)

Updated

3 years ago
Depends on: 1219880
(Assignee)

Updated

3 years ago
Attachment #8680400 - Attachment is obsolete: true
Attachment #8680400 - Flags: review?(jlund)
(Assignee)

Comment 20

3 years ago
Created attachment 8680800 [details] [diff] [review]
Re-enable the upload step, but take out the stuff that no longer works (as in, everything.)

This will probably come back in the form of S3 uploads at some point.
(Assignee)

Comment 21

3 years ago
Created attachment 8680801 [details] [diff] [review]
Upload hazard artifacts to blobber

nthomas, perhaps you're the right one for this. I'm thinking I should just use the blobber for now, but try out the TC S3 upload stuff as a followup. What do you think?
Attachment #8680801 - Flags: review?(nthomas)
(Assignee)

Updated

3 years ago
Attachment #8680800 - Attachment is obsolete: true
(Assignee)

Comment 22

3 years ago
Minor problem: the above is from buildbase.BuildScript. Neither spidermonkey_build.py nor hazard_build.py inherits from that. spidermonkey_build.py does its own thing and inherits from a huge mess of mixins. hazard_build.py inherits from buildb2gbase.B2GBuildBaseScript.

It was pretty straightforward to reparent spidermonkey_build.py to BuildScript, and that seems like a worthwhile thing to do even though it requires a couple of config keys to be set that are irrelevant for this build.

hazard_build.py (aka the b2g hazard build) is a trickier problem. It really does need to share all of the nasty b2g build goop, and I assume that if hierarchy surgery were easy, it would have been done.

I'll look into it.
(Reporter)

Comment 23

3 years ago
Comment on attachment 8680801 [details] [diff] [review]
Upload hazard artifacts to blobber

As you say, you're not including BlobUploadMixin in your scripts, so this won't be sufficient.
Attachment #8680801 - Flags: review?(nthomas)
(Reporter)

Comment 24

3 years ago
If you're into heavy duty mozharness hacking then I'd recommend jlund as reviewer.
(Assignee)

Comment 25

3 years ago
(In reply to Nick Thomas [:nthomas] from comment #23)
> Comment on attachment 8680801 [details] [diff] [review]
> Upload hazard artifacts to blobber
> 
> As you say, you're not including BlobUploadMixin in your scripts, so this
> won't be sufficient.

(doh! Found this comment half-written. I thought I'd submitted it.)

No, this patch depends on bug 1219880, which includes BlobUploadMixin and configures it. This works; see eg https://treeherder.mozilla.org/#/jobs?repo=try&revision=e96f2fe32c72

My review request wasn't really for the mozharness change, which is pretty trivial now that it is on top of the MOZ_UPLOAD_DIR support in bug 1219880. My review request to you is really meant to be for the question "is it ok to upload this stuff to blobber, at least temporarily?"

From what I can tell, it is not straightforward to switch these over to S3 uploads. Using the taskcluster stuff looked like it was going to be a good approach, but the b2g builds do not have that code and it looked like it would be a fair bit of work to rejigger the inheritance relationships to add it in. The browser builds (using spidermonkey_build.py) look to be relatively simple to switch to that, since while they do not currently inherit from BuildScript, there's no particular reason why they couldn't. But it seems weird to have half the jobs uploading to blobber and the other half to S3.

Personally, I'd like to do blobber uploads for now, then when someone moves the b2g stuff over to taskcluster I can move the hazard stuff over too. Regardless, I'll probably change spidermonkey_build.py to inherit from BuildScript, even if it continues to upload to blobber for now. (I have a patch for the inheritance thing already. Hopefully it works.)
(Assignee)

Updated

3 years ago
Attachment #8680801 - Flags: review?(nthomas)
(Reporter)

Comment 26

3 years ago
Comment on attachment 8680801 [details] [diff] [review]
Upload hazard artifacts to blobber

(In reply to Steve Fink [:sfink, :s:] from comment #25)
> My review request wasn't really for the mozharness change, which is pretty
> trivial now that it is on top of the MOZ_UPLOAD_DIR support in bug 1219880.
> My review request to you is really meant to be for the question "is it ok to
> upload this stuff to blobber, at least temporarily?"

Yes, that's fine with us. The only slight concern is blobber handling the increased load, so please ping the sheriffs when this lands so they know what to blame if it breaks across the trees.
 
> Personally, I'd like to do blobber uploads for now, then when someone moves
> the b2g stuff over to taskcluster I can move the hazard stuff over too.
> Regardless, I'll probably change spidermonkey_build.py to inherit from
> BuildScript, even if it continues to upload to blobber for now. (I have a
> patch for the inheritance thing already. Hopefully it works.)

FYI, we'll be adding TC uploads to b2g builds in the next week or two, as we need it to stop uploading to the old ftp system. See bug 1222231.
Attachment #8680801 - Flags: review?(nthomas) → review+
(Assignee)

Comment 29

3 years ago
Not sure if the final thing got checked in here, but I'm trying to kill off the non-taskcluster hazard builds, and if there's anything left to fix here, I'm not going to fix it. The TC tasks are using artifacts and are good.
Status: ASSIGNED → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
Component: General Automation → General
Product: Release Engineering → Release Engineering
You need to log in before you can comment on or make changes to this bug.