RESOLVED FIXED

Status

Tree Management
Treeherder: API
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: bc, Assigned: wlach)

Tracking

Details

Attachments

(2 attachments)

(Reporter)

Description

2 years ago
[12:57] <        bc> known issue on staging with 503 Service Unavailable posting to 
          https://treeherder.allizom.org/api/project/mozilla-central/jobs/ ?
[13:54] <&     wlach> bc: no, let me take a look
[13:54] <&     wlach> thanks for bringing it up
[14:02] <&     wlach> bc: could you file a bug? I need to run but I see the problem -- you are 
          submitting a new reference data signature, so it's trying to update the exclusion 
          profiles, but that operation is timing out
[14:03] <&     wlach> I think we might be able to get around that by switching to identifying 
          exclusion profiles by id instead of hash
Looking at new relic, it seem like it's timing out in update_flat_exclusions() in treeherder/model/models.py (around line 575)

I don't think this is a new problem, but we evidently need to make this operation faster somehow. Perhaps switching from matching on hash strings to integers might help. I'll investigate tomorrow.
Assignee: nobody → wlachance
I think bc is submitting to both stage and prod, so if this wasn't a new problem, it would be affecting both?
(Reporter)

Comment 3

2 years ago
I am currently burning in some new Pixel devices and checking out their behavior on all of the tests I have defined. See:

https://treeherder.allizom.org/#/jobs?repo=mozilla-central&filter-searchStr=autophone&tochange=28eb27fd7d141a8f9f2a6dbfae892ad78ecb40a5&fromchange=05328d3102efd4d5fc0696489734d7771d24459f

I haven't run the full Mochitest or Reftest suites in some time.
Created attachment 8815875 [details]
Another failure

I did some more investigation and it looks like it failed in another spot here, though I doubt it has anything to do with the specific operation (rather than it just randomly timing out here because the operation takes too long). It looks like we're basically rewriting the flat exclusion profile information for *every* repository, even if a new type of job is only added to one.

I think we ought to just do away with storing "flat exclusion profiles" in the database altogether. If you know the project + exclusion profile name you want, this information can be retrieved fairly quickly as part of another get operation.
Created attachment 8815876 [details] [review]
[treeherder] wlach:1320805 > mozilla:master
Comment on attachment 8815876 [details] [review]
[treeherder] wlach:1320805 > mozilla:master

I'm pretty sure this should fix the problem. See PR for details.

We probably want to land this in stages -- the db migration should wait until we're sure we don't want to back out.
Attachment #8815876 - Flags: review?(emorley)
Comment on attachment 8815876 [details] [review]
[treeherder] wlach:1320805 > mozilla:master

Sorry for the delay. This looks fine to me, though I'm less familiar with the nuances of the exclusion profile handling than Cameron.

The PR will need rebasing to update the migration name/dependency, since there are new migrations on master since.
Attachment #8815876 - Flags: review?(emorley) → review+

Comment 8

2 years ago
Commits pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/f9a04c816f828d1272c4c8f9bf39fb19cbdb1faa
Bug 1320805 - Create per-project exclusion lists on-demand

The way we had written things, a "flat exclusion" information (which
is essentially just a cache) had to be rewritten for *every* repository
on submission of a new job, even though the new job would apply to
only one repository. Let's fix this by just calculating this information
on demand (most of the time this should be very fast, as we already
store the final data in memcache)

https://github.com/mozilla/treeherder/commit/0070e189b003420f676280293ad6b5b092eba1d3
Bug 1320805 - Remove flat exclusion column from db
Seems to be working well on stage, will make it to production on next deploy.
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.