bbb: reportException malformed-payload instead of cancelling a task that doesn't meet our spec

RESOLVED FIXED

Status

Release Engineering
General
RESOLVED FIXED
3 years ago
18 days ago

People

(Reporter: selenamarie, Assigned: bhearsum)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments)

After talking with Jonas, it became clear that we had a mechanism for rejecting "ill formed" tasks -- reportException!

The idea is that we reportException instead of cancelling so that the task becomes un-retriable, and we can upload a log with the details of the error at the same time.
(Assignee)

Comment 1

3 years ago
Can we reportException on unclaimed tasks? Last I heard we couldn't...
Flags: needinfo?(jopsen)
@bhearsum,
No why would want to do that?

You claim the task, then if it's
 - "ill formed",
 - missing scopes, or, 
 - missing properties in task.payload,
you do:
 1) upload a log artifact for the task explaining why,
 2) reportException with reason: malformed-payload

And that is it...

----
There is a JSON schema package for python here:
  https://pypi.python.org/pypi/jsonschema
I strongly recommend that define a JSON schema for the task.payload of your tasks,
And validate it with the JSON schema, these schemas also makes for great documentation.
Pro tip:
 - Use additionalProperties: false (to reserve all unused properties for future use)
 - Specify required: [...] for properties that are important to you
 - Specify both type: string and enum: [...] when making enums
 - Add a description for all properties, use markdown for description.
 - Write JSON schemas in YAML, it's much prettier and you can do: description: | \n...
Flags: needinfo?(jopsen)
(Assignee)

Comment 3

3 years ago
I merged and deployed https://github.com/mozilla/buildbot-bridge/pull/2 today, which implements task validation with a jsonschema. Should be simple to fix this bug now, I'll try to get to it soon.
Assignee: nobody → bhearsum
(Assignee)

Comment 4

3 years ago
Created attachment 8653550 [details] [review]
claim + resolve instead of cancel

This patch turned out to be relatively simple - just s/cancelTask/ with the two new calls. I also needed to start passing worker group/id to the TCListener, because they're needed when claiming. A couple of other things snuck in too:
* s/assertEquals/assertEqual/ (the former is deprecated)
* Adjust "WEIRD" log message that currently spams the logs whenever a non-bridge Buildbot job starts.

I want to do more testing of this once I get the dev environment set-up in https://bugzilla.mozilla.org/show_bug.cgi?id=1199247, but I think it's ready for review in the meantime.
Attachment #8653550 - Flags: review?(jopsen)
(Assignee)

Comment 5

3 years ago
Created attachment 8653589 [details] [diff] [review]
add worker group/id to top level of config

These are now shared by tclistener and bblistener. I don't want to remove them from bblistener's config yet because I'll be testing out the bbb patch in dev (when it exists) before deploying to prod. Once the patch makes it to dev+prod I'll rip them out from the old location.
Attachment #8653589 - Flags: review?(rail)
Attachment #8653589 - Flags: review?(rail) → review+
Comment on attachment 8653550 [details] [review]
claim + resolve instead of cancel

Looks good, see comment on PR for suggestions.
Pretty sure they are unimportant.
Attachment #8653550 - Flags: review?(jopsen) → review+
(Assignee)

Updated

3 years ago
Attachment #8653589 - Flags: checked-in+
(Assignee)

Updated

3 years ago
Attachment #8653550 - Flags: checked-in+
(Assignee)

Comment 7

3 years ago
This worked in dev, eg: https://tools.taskcluster.net/task-graph-inspector/#nQnVW-adTCOB_g5TeS6VyQ/BE8VNC2UQi6KnfIfO66KzA/0 was resolved with malformed payload instead of cancelled. This should go into production by early next week.
(Assignee)

Comment 8

3 years ago
Deployed to production today.
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
Component: General Automation → General
Product: Release Engineering → Release Engineering
You need to log in before you can comment on or make changes to this bug.