Closed Bug 1203552 Opened 4 years ago Closed 4 years ago

Work around for BBB tasks not uploading properties.json

Categories

(Release Engineering :: Applications: MozharnessCore, defect)

defect
Not set

Tracking

(firefox45 fixed)

RESOLVED FIXED
Tracking Status
firefox45 --- fixed

People

(Reporter: armenzg, Assigned: armenzg)

References

Details

Attachments

(3 files, 1 obsolete file)

This will be useful for this bug:
* Bug 1203085 - Help buildbot test jobs scheduled through BBB to find the installer and test packages it needs

Without it, I would have to list the artifacts for a build and have to do file matching per platform.
With this, I can instead look for the following keys:
06:41:29     INFO -  u'packageUrl': u'http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-linux/1441889493/firefox-43.0a1.en-US.linux-i686.tar.bz2',
06:41:29     INFO -  u'testPackagesUrl': u'http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-linux/1441889493/test_packages.json',
There is also graphs.json with the TC urls:

06:56:36     INFO - Dumping config to /builds/slave/m-in-lx-0000000000000000000000/graph_props.json.
06:56:36     INFO -                 u'packageUrl': u'https://queue.taskcluster.net/v1/task/DthChfLJQvqV_7VeT2YVzQ/artifacts/public/build/firefox-43.0a1.en-US.linux-i686.tar.bz2',
06:56:36     INFO -                 u'symbolsUrl': u'https://queue.taskcluster.net/v1/task/DthChfLJQvqV_7VeT2YVzQ/artifacts/public/build/firefox-43.0a1.en-US.linux-i686.crashreporter-symbols.zip',
06:56:36     INFO -                 u'testPackagesUrl': u'https://queue.taskcluster.net/v1/task/DthChfLJQvqV_7VeT2YVzQ/artifacts/public/build/test_packages.json',

<mshal> armenzg: well you can probably just add a line like this to get it: https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/building/buildbase.py#1500
This push uploaded graph_props.json to TaskCluster:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=8f65acaa0157

Artifact:
https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/MEhFQCBdTWmiZ4k4MklcLg/0/public/build/graph_props.json

dustin: if this file is uploaded for a build, all the metadata becomes available to test taskcluster tasks.
Summary: Upload mach_build_properties.json → Upload graph_props.json for all builds
Yeah, but that's too late to incorporate it into the task definition, since that's created when the task graph is created.  Relying on these properties would mean that (a) buildbot properties live on after buildbot and (b) the test tasks depend on there having been a build task, and as such can't be created independently.

Ideally, test tasks have a task definition that basically says "run tests X on binary Y".  Changing that to "run tests X on the binary created by task Z" is more complex, so I'd like to avoid it.
Bug 1203552 - Upload graph_props.json. r=mshal
Attachment #8659428 - Flags: review?(mshal)
I'm not planning on doing it during task definition.

For now I'm going to ahead and request uploading the properties I need until your bug lands (since I don't know how long will that bug take).

We can revert this once your bug sticks. My code is easy to modify to follow your approach plus I get you some leg work in Mozharnes until your changes land.
Comment on attachment 8659428 [details]
MozReview Request: Bug 1203552 - Upload graph_props.json. r=mshal

I don't need this patch.

Every task which represents a Buildbot job uploads a properties.json:
https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/KrVnqlYAR7msaUtyxHTWWg/0/public/properties.json
Attachment #8659428 - Flags: review?(mshal)
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → INVALID
For some reason we have stopped uploading the properties.json file
Status: RESOLVED → REOPENED
Resolution: INVALID → ---
Attachment #8659428 - Attachment is obsolete: true
Bug 1221091 for properties.json not working anymore.

I should have a patch by the EOD.
Assignee: nobody → armenzg
mshal's code uploads artifacts to task that gets created at runtime.
If using BBB, we already have a task associated to the Buildbot job.
This task is indicated as a Buildbot property called 'taskId'.
If 'taskID' is found in the properties, we know that the Buildbot job was triggered thanks to BBB.
In that case, we tell mshal's code to use that taskId instead of creating a new one.

The other code change I'm proposing generates a buildbot_properties.json file which we upload right after all the artifacts are uploaded.
Any more Buildbot properties which are set after that point will not be reflected inside of this buildbot_properties.json.
The only consumer for this will be Buildbot jobs triggered through the Buildbot Bridge.
Summary: Upload graph_props.json for all builds → Upload buildbot_properties.json for all builds
Depends on: 1221690
mshal, I tried to tell your system to use the taskId which BBB associates to a Buildbot job.
Unfortunately, it seems that there is a properly placed security measure to prevent hijacking and already claimed task [1].

Instead of uploading artifacts to an already claimed task I will try to pass an ID for an unclaimed task.

I've also considered using the index [2] but I'm not sure if that would work or how it would work with a second build running on the same push.


[1]
12:14:18     INFO - Taskcluster taskId: Y3gtbOx9TJmuz8MV6_hP8A
12:14:18     INFO - Routes: [u'index.gecko.v2.try.revision.3a3fbc83aad2.firefox.linux64-debug', u'index.gecko.v2.try.pushdate.2015.11.04.20151104194615.firefox.linux64-debug', u'index.gecko.v2.try.latest.firefox.linux64-debug', 'index.buildbot.branches.try.linux64-debug', 'index.buildbot.revisions.3a3fbc83aad2.try.linux64-debug']
12:14:18     INFO - Starting new HTTPS connection (1): queue.taskcluster.net
12:14:20     INFO - Running post-action listener: influxdb_recording_post_action
12:14:20     INFO - Resetting dropped connection: goldiewilson-onepointtwentyone-1.c.influxdb.com
12:14:20    FATAL - Uncaught exception: Traceback (most recent call last):
12:14:20    FATAL -   File "/builds/slave/try-l64-d-00000000000000000000/scripts/mozharness/base/script.py", line 1718, in run
12:14:20    FATAL -     self.run_action(action)
12:14:20    FATAL -   File "/builds/slave/try-l64-d-00000000000000000000/scripts/mozharness/base/script.py", line 1660, in run_action
12:14:20    FATAL -     self._possibly_run_method(method_name, error_if_missing=True)
12:14:20    FATAL -   File "/builds/slave/try-l64-d-00000000000000000000/scripts/mozharness/base/script.py", line 1601, in _possibly_run_method
12:14:20    FATAL -     return getattr(self, method_name)()
12:14:20    FATAL -   File "/builds/slave/try-l64-d-00000000000000000000/scripts/mozharness/mozilla/building/buildbase.py", line 1600, in upload_files
12:14:20    FATAL -     property_conditions=property_conditions)
12:14:20    FATAL -   File "/builds/slave/try-l64-d-00000000000000000000/scripts/mozharness/mozilla/building/buildbase.py", line 1490, in _taskcluster_upload
12:14:20    FATAL -     task = tc.create_task(routes)
12:14:20    FATAL -   File "/builds/slave/try-l64-d-00000000000000000000/scripts/mozharness/mozilla/taskcluster_helper.py", line 65, in create_task
12:14:20    FATAL -     }, taskId=self.task_id)
12:14:20    FATAL -   File "/builds/slave/try-l64-d-00000000000000000000/build/venv/lib/python2.7/site-packages/taskcluster/client.py", line 455, in apiCall
12:14:20    FATAL -     return self._makeApiCall(e, *args, **kwargs)
12:14:20    FATAL -   File "/builds/slave/try-l64-d-00000000000000000000/build/venv/lib/python2.7/site-packages/taskcluster/client.py", line 232, in _makeApiCall
12:14:20    FATAL -     return self._makeHttpRequest(entry['method'], route, payload)
12:14:20    FATAL -   File "/builds/slave/try-l64-d-00000000000000000000/build/venv/lib/python2.7/site-packages/taskcluster/client.py", line 424, in _makeHttpRequest
12:14:20    FATAL -     superExc=rerr
12:14:20    FATAL - TaskclusterRestFailure: taskId: Y3gtbOx9TJmuz8MV6_hP8A already used by another task

[2] https://tools.taskcluster.net/index/#gecko.v2.try.revision.c3820955e3783c71f063875cafd48d49c5ffa838.firefox.linux64-opt/gecko.v2.try.revision.c3820955e3783c71f063875cafd48d49c5ffa838.firefox.linux64-opt
Bug 1203552 - Upload buildbot_properties.json for all builds

There are now Buildbot jobs which can be triggered through the Buildbot Bridge.
Buildbot test jobs triggered by BBB builds do not receive the installer and test
urls thanks to a sendchange, hence, failling to run.

In order to pass the installer and test urls to Buildbot test jobs (which run
with --read-buildbot-configs) we have added functionality to find these
build artifacts by querying TaskCluster.

Up until last quarter this worked because the Buildbot Bridge uploaded a proprties.json
file for every task it created to mirror a Buildbot job (bug 1221091).

Since that is broken, I have found that we can generate a buildbot_properties.json
file and upload it. The current upload mechanism for Buildbot build jobs creates a
TC task and uploads the artifacts. In this code change, we determine when a Buildbot
job is being driven by a task on TC and tell the upload system to use that task instead
of creating a new one.

By doing that, tools like Mozilla CI tools can schedule TC graphs out of band by
specifying 'parent_task_id' and getting to the buildbot_properties.json file uploaded
by the parent build job.
I've posted my patches for now.
I would prefer bug 1221091 to be fixed, however, if it comes to it I would prefer to land this so we can get adusca moving forward until bug 1221091 is fixed.
Last changes required to make the new approach work.
Attachment #8683324 - Flags: review?(mshal)
Comment on attachment 8683324 [details]
MozReview Request: Bug 1203552 - Upload buildbot_properties.json for all builds

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/24279/diff/1-2/
Attachment #8683844 - Flags: review?(mshal)
Comment on attachment 8683844 [details]
MozReview Request: Last changes required to make the new approach work.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/24445/diff/1-2/
Hi mshal,
This is working on https://treeherder.mozilla.org/#/jobs?repo=try&revision=a1e682c00ff3
I will do a follow up if bug 1221091 ever gets addressed.

So far it seems that everyone is busy to look into it!

We can see in the build that two taskId are in the Buildbot properties:
11:21:18     INFO -         "taskId": "KhRg8sEpRfy1AdguTAlFSQ", 
11:21:18     INFO -         "upload_to_task_id": "WCfk_q4yTJCFw0rC9K9CaQ", 

The artifacts are uploaded to here:
https://queue.taskcluster.net/v1/task/WCfk_q4yTJCFw0rC9K9CaQ/artifacts
and the buildbot_properties.json in here:
https://queue.taskcluster.net/v1/task/WCfk_q4yTJCFw0rC9K9CaQ/artifacts/public/build/buildbot_properties.json

The Mozharness code changes allows the test job to find packageUrl and testPackagesUrl through buildbot_properties.json instead of properties.json.
Here's the accompanying Mozci changes:
https://github.com/mozilla/mozilla_ci_tools/pull/371
Blocks: 1194830
No longer blocks: 1203085
Summary: Upload buildbot_properties.json for all builds → Work around BBB tasks not uploading properties.json
Duplicate of this bug: 1220840
Blocks: 1194264
Summary: Work around BBB tasks not uploading properties.json → Work around for BBB tasks not uploading properties.json
Comment on attachment 8683844 [details]
MozReview Request: Last changes required to make the new approach work.

https://reviewboard.mozilla.org/r/24445/#review21955

This patch looks good, though I think you can just fold it into the previous.
Attachment #8683844 - Flags: review?(mshal) → review+
Comment on attachment 8683324 [details]
MozReview Request: Bug 1203552 - Upload buildbot_properties.json for all builds

https://reviewboard.mozilla.org/r/24279/#review21953

Looks pretty close! Just wondering about these two issues.

::: testing/mozharness/mozharness/mozilla/building/buildbase.py:1271
(Diff revision 1)
> -        # TODO it would be better to grab all the properties that were
> +        graph_props_path = os.path.join(c['base_work_dir'], "graph_props.json")

So we are generating this file twice with different filenames, correct? Is there a reason we can't just generate it once and use a single filename?

::: testing/mozharness/mozharness/mozilla/building/buildbase.py:1481
(Diff revision 1)
> +            task_id=self.buildbot_config['properties'].get('upload_to_task_id'),

When we are using an already existing task_id like this, I think we want to avoid doing create_task(), claim_task(), and report_completed(). Doesn't BBB already have the task created? Or does it just generate an ID and it's still up to mozharness to create & finalize the task?
Attachment #8683324 - Flags: review?(mshal)
https://reviewboard.mozilla.org/r/24279/#review21953

> So we are generating this file twice with different filenames, correct? Is there a reason we can't just generate it once and use a single filename?

It is *only* generated a second time for certain types of builds (perhaps PGO only) that post to the graph server.
In any case, I was somewhat concerned that I would expose new issues by using a file that is generated at an earlier point in time.

Do you mind if I land as-is and follow up later? I don't want this work to get backed out because the code for talos is executed differently than try.

> When we are using an already existing task_id like this, I think we want to avoid doing create_task(), claim_task(), and report_completed(). Doesn't BBB already have the task created? Or does it just generate an ID and it's still up to mozharness to create & finalize the task?

There is a task associated to the BB job which BBB has created, however, I can't give to your code the taskId of that task as it claimed by BBB.
This is why I only pass an ID of a non-existent task.
In other words, you got it right!

Can I hit "drop" for this raised issue?
(In reply to Armen Zambrano Gasparnian [:armenzg] from comment #22)
> https://reviewboard.mozilla.org/r/24279/#review21953
> 
> > So we are generating this file twice with different filenames, correct? Is there a reason we can't just generate it once and use a single filename?
> 
> It is *only* generated a second time for certain types of builds (perhaps
> PGO only) that post to the graph server.
> In any case, I was somewhat concerned that I would expose new issues by
> using a file that is generated at an earlier point in time.
> 
> Do you mind if I land as-is and follow up later? I don't want this work to
> get backed out because the code for talos is executed differently than try.

Sure thing.

> > When we are using an already existing task_id like this, I think we want to avoid doing create_task(), claim_task(), and report_completed(). Doesn't BBB already have the task created? Or does it just generate an ID and it's still up to mozharness to create & finalize the task?
> 
> There is a task associated to the BB job which BBB has created, however, I
> can't give to your code the taskId of that task as it claimed by BBB.
> This is why I only pass an ID of a non-existent task.
> In other words, you got it right!
> 
> Can I hit "drop" for this raised issue?

What happens if you just use the taskId from BBB and call createArtifact from mozharness? The failure in #c10 looks like you still have mozharness doing createTask, which it should be skipping (along with claimTask and reportCompleted) in this case.
(In reply to Michael Shal [:mshal] from comment #23)
> (In reply to Armen Zambrano Gasparnian [:armenzg] from comment #22)
> > https://reviewboard.mozilla.org/r/24279/#review21953
> > 
> > > So we are generating this file twice with different filenames, correct? Is there a reason we can't just generate it once and use a single filename?
> > 
> > It is *only* generated a second time for certain types of builds (perhaps
> > PGO only) that post to the graph server.
> > In any case, I was somewhat concerned that I would expose new issues by
> > using a file that is generated at an earlier point in time.
> > 
> > Do you mind if I land as-is and follow up later? I don't want this work to
> > get backed out because the code for talos is executed differently than try.
> 
> Sure thing.
> 
Filed as bug 1223464.

> > > When we are using an already existing task_id like this, I think we want to avoid doing create_task(), claim_task(), and report_completed(). Doesn't BBB already have the task created? Or does it just generate an ID and it's still up to mozharness to create & finalize the task?
> > 
> > There is a task associated to the BB job which BBB has created, however, I
> > can't give to your code the taskId of that task as it claimed by BBB.
> > This is why I only pass an ID of a non-existent task.
> > In other words, you got it right!
> > 
> > Can I hit "drop" for this raised issue?
> 
> What happens if you just use the taskId from BBB and call createArtifact
> from mozharness? The failure in #c10 looks like you still have mozharness
> doing createTask, which it should be skipping (along with claimTask and
> reportCompleted) in this case.

I did not try. I assumed that we can't upload artifacts from a client that has not started the task to begin with.

Would you like me to investigate it?
@adusca: you can find a working package in here:
https://pypi.python.org/pypi/mozci/0.19.1.dev1

This is what I believe to be last piece of work needed (if you have time let me know; otherwise I will be pushing for December):
https://github.com/mozilla/mozilla_ci_tools/issues/376
https://hg.mozilla.org/mozilla-central/rev/f0e6e0ae362d
Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.