Closed Bug 1306709 Opened 8 years ago Closed 7 years ago

Move SETA to Treeherder

Categories

(Tree Management Graveyard :: Treeherder: SETA, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: armenzg, Assigned: armenzg)

References

Details

Attachments

(5 files, 6 obsolete files)

I will be looking into turning SETA into a django app.

We will need to define the models and views.
As well as writing tests for it.

I will start by setting up Treeherder locally, grabbing some Buildbot data from a push and adjust URLConf to point to the SETA app.
Assignee: nobody → armenzg
Attached file SQL statement
No rush on the feedback.

My current django models.py generates this.
Which changes would you like to see?

jmaher: could you please attach the definition for job priorities? I don't think I completely set up ouija/seta locally.

The tables are named starting with "seta_" to make it easier to determine which app they're associated to.
We don't have to use it.

I've also changed `branch` for `repo_name`.
In taskrequests I've removed `id` and made `repo_name` the PK.

I have this definition for taskrequests [1]

CREATE TABLE `taskrequests` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `branch` varchar(128) NOT NULL,
  `counter` int(11) NOT NULL,
  `datetime` datetime NOT NULL,
  `reset_delta` int(11) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `ix_taskrequests_branch` (`branch`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Attachment #8798063 - Flags: feedback?(jmaher)
Comment on attachment 8798063 [details]
SQL statement

this looks correctly, is there something missing which is why you are asking about me to update job priorities?
Attachment #8798063 - Flags: feedback?(jmaher) → feedback+
I wanted an sql dump to compare.
I'm following the same structure as models.py from ouija so it should be about the same.
this is a new table for the taskcluster scenario- we do not have this table on alertmanager or mysql technically.
Attached file Diff showing current changes (obsolete) —
Comment on attachment 8798531 [details]
Diff showing current changes

* Enabled a seta django app inside of Treeherder
* Enabled seta specific url configurations
* Models' migration
* Added some models and tests for them
** Not all tables from SETA will move forward
** TaskRequests has no data on seta-dev on Heroku for me to check for data
* Create a couple of hello world views
Attachment #8798531 - Flags: feedback?(wlachance)
Comment on attachment 8798531 [details]
Diff showing current changes

In the future could you attach the pull request instead of the diff? Follow the procedure here:

http://globau.wordpress.com/2013/10/21/github-pull-requests-and-bugzilla/

Anyhoo, good start. :) See PR for comments.
Attachment #8798531 - Flags: feedback?(wlachance) → feedback+
Attachment #8798531 - Attachment is obsolete: true
Hi jmaher,
From working on the code, I believe that the columns testtype, buildtype and platform fromjob priority is composite unique index.

Is this accurate? I would like to update the models to reflect this.
Flags: needinfo?(jmaher)
yes, this is correct!
Flags: needinfo?(jmaher)
This change refactors the code to make it easier to hack.
It also enforces the composite unique index to prevent bugs in the code.

DO NOT MERGE. I want to rebase the code and squash the 3 commits into one single refactoring commit.
Attachment #8801857 - Flags: review?(jmaher)
Attachment #8801857 - Flags: feedback?
FYI I rebased and merge those 3 commits. There's no reason to wait.
Attachment #8801857 - Flags: review?(jmaher) → review-
Comment on attachment 8801857 [details] [review]
ouija/seta refactoring PR

Please do not merge if r+. Please let me edit the commits to bring it back to only 3 commits.
Attachment #8801857 - Flags: review- → review?(jmaher)
Attachment #8801857 - Flags: review?(jmaher) → review+
wlach, I would like to gather few days worth of jobs that have failed and starred.
I'm only interested on the ones "fixed by commit" but I want to have all of them to test against real data.

How do I go about this?
(In reply to Armen Zambrano [:armenzg] (EDT/UTC-4) from comment #15)
> wlach, I would like to gather few days worth of jobs that have failed and
> starred.
> I'm only interested on the ones "fixed by commit" but I want to have all of
> them to test against real data.
> 
> How do I go about this?

Hmm, unfortunately this is more difficult than it should be at present. :( The data is distributed between the master treeherder database and the per-project ones. So for you to do this locally, we'd need to dump out the tables for those data ranges and import them into the vagrant instance. This will be a lot easier in a couple weeks, after bug 1178641 is fixed (it would be maybe a 3-4 hour job now, perhaps reduced to 1-2 then)

Maybe the easiest thing would be for us to rebase the heroku prototype against a snapshot of what we have in production (I think we want to do this anyway), and then have you test your code against there (deploying to heroku is basically a one click operation). Does that sound reasonable?
Yes! That works for me.
How long will this take? (just so I can prioritize)
could you use the Ouija database for this?
(In reply to Joel Maher ( :jmaher) from comment #18)
> could you use the Ouija database for this?

For the final product I don't want this.
For now, for the prototype, I think I could use the failures endpoint from the seta-dev instance.
(In reply to Armen Zambrano [:armenzg] (EDT/UTC-4) from comment #17)
> Yes! That works for me.
> How long will this take? (just so I can prioritize)

I am hoping to have most of bug 1178641 wrapped up by early November.
Comment on attachment 8798533 [details] [review]
[treeherder] armenzg:seta > mozilla:master

https://github.com/mozilla/treeherder/pull/1957
Attachment #8798533 - Attachment filename: Github pull request #1900 → Github pull request #1957
Attachment #8798533 - Attachment is obsolete: true
wlach: do you have time to meet on Tuesday/Wednesday?

The way that SETA currently works is by having certain scripts run on certain schedule.

There are mainly two concepts involved.
1) Create job priorities based on runnable api info (either with high value or low value)
2) Analyze failures (those marked as fixed by commit) in the last X days and update the jobs priority table

I am currently working on porting the script for #1, however, I want to know if it is possible to run a command like on a schedule:
> ./manage.py update_jobs_priority

This script currently works by downloading the latest data from the runnable jobs api. It then determines which jobs are new and inserts them with a give set of defaults (including an expiration data). On the first run, all jobs are inserted with a different set of defaults.

I could change how the logic of this script works to skip runnable jobs completely. To do this, I would need to be able to be notified when a new job is detected by Treeherder and create a job priority for it.

Another way would be that every time a gecko decision task completes we update a runnable jobs table (to be created).

If you prefer we can stick to running the command on a schedule and follow up with further optimizations later on.

----------

Without knowing enough about #2, I believe we would not need to run #1 on a schedule if we implement #2 like this:
Once a day we analyze failures and update the jobs priority table as we would normally do. If a job has never been added we add it as a high value job priority.

I might switch gears to this for a bit and see what is possible.

For now I will query allizom.mozilla.org for failures data until there is a way for me to get a TH DB dump.
Comment on attachment 8804853 [details] [review]
[treeherder] armenzg:seta_rewrite > mozilla:master

Since we're moving to django rest framework, I assume I don't need to keep treeherder/seta/{views.py,urls.py}.

Do I even want a seta app enabled? Should models.py move somewhere else?

I've also added some of the data for you to inspect [1]

How can I run the tests quickly? ./runtests.sh runs everything.

jmaher: I'm going to call the end point seta/high-value-jobs; works for you?


[1]
mysql> select * from seta_jobpriority where buildsystem='taskcluster' limit 5;
+------+----------------------------+-----------+----------+----------+---------+-------------+-----------------+
| id   | testtype                   | buildtype | platform | priority | timeout | buildsystem | expiration_date |
+------+----------------------------+-----------+----------+----------+---------+-------------+-----------------+
| 1380 | xpcshell-1                 | opt       | linux64  |        5 |    5400 | taskcluster | NULL            |
| 1381 | reftest-6                  | opt       | linux64  |        5 |    5400 | taskcluster | NULL            |
| 1382 | jittest-6                  | opt       | linux64  |        5 |    5400 | taskcluster | NULL            |
| 1383 | mochitest-browser-chrome-6 | debug     | linux64  |        5 |    5400 | taskcluster | NULL            |
| 1384 | xpcshell-2                 | debug     | linux64  |        5 |    5400 | taskcluster | NULL            |
+------+----------------------------+-----------+----------+----------+---------+-------------+-----------------+
5 rows in set (0.00 sec)

mysql> select * from seta_jobpriority limit 5;
+----+----------------------+-----------+-------------+----------+---------+-------------+-----------------+
| id | testtype             | buildtype | platform    | priority | timeout | buildsystem | expiration_date |
+----+----------------------+-----------+-------------+----------+---------+-------------+-----------------+
|  1 | web-platform-tests-1 | debug     | windows8-64 |        5 |    5400 | buildbot    | NULL            |
|  2 | web-platform-tests-3 | debug     | windows8-64 |        5 |    5400 | buildbot    | NULL            |
|  3 | web-platform-tests-2 | debug     | windows8-64 |        5 |    5400 | buildbot    | NULL            |
|  4 | web-platform-tests-5 | debug     | windows8-64 |        5 |    5400 | buildbot    | NULL            |
|  5 | web-platform-tests-4 | debug     | windows8-64 |        5 |    5400 | buildbot    | NULL            |
+----+----------------------+-----------+-------------+----------+---------+-------------+-----------------+
5 rows in set (0.00 sec)
Attachment #8804853 - Flags: feedback?(wlachance)
high-value-jobs is probably incorrect, technically we return the low value jobs by default and in practice.
I have a seta/v1/low-value-jobs endpoint which accepts build_system_type={buildbot,taskcluster}.
Depending on the parameter it will show either Buildbot jobs or TaskCluster ones.

I now have to focus on return the same data as our current production endpoints:
> e.g. reftest-13 vs desktop-test-linux64-pgo/opt-reftest-13
> e.g. web-platform-tests-e10s vs Windows 7 VM 32-bit mozilla-inbound debug test marionette-e10s
Comment on attachment 8804853 [details] [review]
[treeherder] armenzg:seta_rewrite > mozilla:master

Good progress!
Attachment #8804853 - Flags: feedback?(wlachance) → feedback+
I'm not requesting for feedback in this comment.

I've pushed an administrative command which talks to alertmanager to import test failures.

The current PR can be bootstrapped by simply calling this command:
> ./manage.py analyze_failures

I'm now going to fix this endpoint to return the data properly:
http://local.treeherder.mozilla.org/api/seta/v1/low-value-jobs/?build_system_type=taskcluster

I'm waiting on wlach's change where he will be unifying the Treeherder databases.
At that point I could start using the new models.

The following steps would be to write unit tests and look into the cerlery beats to schedule the analysis of failures.

I also want to add data validators to the SETA models to ensure integrity.
Hi jmaher, I've been able to modify the end point to return TaskCluster's data as returned in production.
I've also modified the endpoint to allow to optionally pass priority as a value. This means that we can:
* set no priority --> all jobs
* priority={1,5} --> high or low value jobs

Keeping the priority flag would obviously make us desire to rename the endpoint to something that does not say "low" in it.
That's a discussion for later.

The reason I want to bring this up is that I've noticed that the production data only gives 4 jobs with priority 1 out of a total of 670 jobs.
IIUC it would mean that we only run 4 TaskCluster jobs per push when SETA is enabled.

In other words, there might be a bug in the current production system and I can reproduce it on my local prototype.

STR:
git clone https://github.com/armenzg/treeherder.git
cd treeherder
git checkout seta_rewrite
vagrant up
vagrant ssh
./manage.py runserver
Open http://localhost:8000/api/seta/v1/low-value-jobs/?build_system_type=taskcluster&priority=5
I also would like to discuss if we can shift away from testtype (manipulated data) and instead use ref_data_name and other values already in Treeherder.
I think most of SETA's code deals with trying to keep these tied (besides fetching data from endpoints).
(In reply to Armen Zambrano [:armenzg] (EDT/UTC-4) from comment #30)
> I also would like to discuss if we can shift away from testtype (manipulated
> data) and instead use ref_data_name and other values already in Treeherder.

Hey Armen, I think it's a great idea because all these stuff are fetch(directly or indirectly fetch from Treeherder).
while I would like to have data from ref_data_name, we need to make sure that it is properly handled.  If we have a bunch of random names, then SETA becomes ineffective.  Can you give an example of linux64 opt mochitest-1 e10s ref_data_name for TC and BB?  Possibly that will help.

regarding the 4 jobs for priority1, I do not see that:
http://seta.herokuapp.com/data/setadetails/?taskcluster=1&priority=1
Buildbot: [1]
* "ref_data_name": "Ubuntu VM 12.04 x64 mozilla-inbound opt test mochitest-e10s-1",
* "build_platform": "linux64",
* "platform_option": "opt",

TaskCluster: [2]
* "ref_data_name": "desktop-test-linux64/opt-mochitest-e10s-1",
* "build_platform": "linux64",
* "platform_option": "opt",


[1]
{
            "option_collection_hash": "102210fe594ee9b33d82058545b1ed14f4c8206e",
            "ref_data_name": "Ubuntu VM 12.04 x64 mozilla-inbound opt test mochitest-e10s-1",
            "build_platform": "linux64",
            "job_type_name": "Mochitest e10s",
            "job_group_id": 9,
            "platform_option": "opt",
            "machine_platform_os": "linux",
            "machine_platform_id": 9,
            "job_coalesced_to_guid": null,
            "platform": "linux64",
            "result": "runnable",
            "state": "runnable",
            "job_group_name": "Mochitest e10s",
            "job_type_description": "multiprocess - integration tests",
            "machine_platform_architecture": "x86_64",
            "job_type_symbol": "1",
            "job_group_description": "",
            "build_system_type": "buildbot",
            "build_platform_id": 9,
            "build_architecture": "x86_64",
            "job_type_id": 152,
            "job_group_symbol": "M-e10s",
            "build_os": "linux"
        },
[2]
        {
            "state": "runnable",
            "job_group_name": "Mochitests executed by TaskCluster with e10s",
            "job_type_description": "Mochitest plain run",
            "ref_data_name": "desktop-test-linux64/opt-mochitest-e10s-1",
            "build_platform": "linux64",
            "job_type_symbol": "1",
            "build_system_type": "taskcluster",
            "job_type_name": "desktop-test-linux64/opt-mochitest-e10s-1",
            "platform_option": "opt",
            "job_group_symbol": "tc-M-e10s",
            "result": "runnable",
            "job_coalesced_to_guid": null,
            "platform": "linux64"
        },
Depends on: 1178641
SETA needs to return ref_data_name information to Buildbot and TaskCluster.

IIUC the runnable jobs api works is the following:
* If a decision task ID is provided go and interact with the gecko decision task to determine what to return
* For Buildbot jobs go and get the data from the runnable_job table

It seems that the TC jobs are not kept on that table.

I can create an administrative command that could keep the TaskCluster jobs up-to-date on that table (add and remove old jobs).

I would like to know if we could call such command every time a gecko decision completes (for certain repositories) or on a schedule (every 4 hours would be sufficient).
unless we depend more heavily on preseed.json or other override mechanisms not yet thought up, there is no need to keep the jobs up to date.  But in the spirit of not coding assumptions in, we should work to update the table as frequently as possible keeping in mind our gecko decision task should be faster and we shouldn't be bogging down treeherder servers or locking the database.
wlach: I have this wrt to adding a task; I want to call:
> ./manage.py analyze_failures

Should I move most of the logic of the administrative command into a module? That way we can use it for the task as well as the administrative command; what do you think?
Flags: needinfo?(wlachance)
I forgot to mention that yesterday I added Buildbot support.

I'm now focused on adding support taskcluster requests.
(In reply to Armen Zambrano [:armenzg] (EDT/UTC-4) from comment #36)
> wlach: I have this wrt to adding a task; I want to call:
> > ./manage.py analyze_failures
> 
> Should I move most of the logic of the administrative command into a module?
> That way we can use it for the task as well as the administrative command;
> what do you think?

Yep, this is a great idea.
Flags: needinfo?(wlachance)
I further improved the end point to support TaskRequest and other logic from the original source code.
I'm focusing on writing tests for it.
Comment on attachment 8804853 [details] [review]
[treeherder] armenzg:seta_rewrite > mozilla:master

This change creates a scheduled beat task that analyzes failures for the last 90 days. It then goes an adjusts the job priority table.

This also adds an end point that returns what to schedule for Buildbot and for TaskCluster.

My main focus while this is reviewed is to add more tests.
Specially testing the endpoint added.

Once wlach's work lands I will look into changing our usage of getting failures from alertamanager.

Some of this code could also go away if we refactored the runnable jobs API to actually keep TaskCluster's data inside the runnable jobs table.
Attachment #8804853 - Flags: review?(wlachance)
Attachment #8804853 - Flags: review?(jmaher)
Attachment #8804853 - Flags: feedback+
Comment on attachment 8804853 [details] [review]
[treeherder] armenzg:seta_rewrite > mozilla:master

pretty close- a few issues left in the PR
Attachment #8804853 - Flags: review?(jmaher) → review-
Attached file Task list
This task list helps me keep an eye of big picture requirements.
Comment on attachment 8804853 [details] [review]
[treeherder] armenzg:seta_rewrite > mozilla:master

Cancelling review for now.
Attachment #8804853 - Flags: review?(wlachance)
In my last changes I have worked on testing the endpoint which the Gecko decision task and Buildbot would be hitting.
I've reworked the test infrastructure to store the job priorities directly in the test table.
This make the tests slower but it makes writing tests easier and be more realistic.
Also the fixtures are derived from a single data fixture.

I've also squashed all the commits and rebased.

Tomorrow and during the time in Hawaii I will focus on switching off requesting failures data from allizom. We will query instead directly from Treeherder.

After that, we will need to add support for preseed.
jmaher: how does the setadetails endpoint know for Buildbot if to return low value jobs or not?

I'm looking at this code (used in alertmanager):
https://github.com/mozilla/ouija/blob/production-alertmanager/src/server.py#L438-L501

Is the consumer (the Buildbot master) in charge of keeping track of the counter and timeout?
buildbot sends in inactive=1 (low value) in a url param and that is how we determine what to do.  buildbot does its own coalescing and will call seta when it determines there is a need for coalescing.
I've addressed all comments by wlach from the review last Thursday.

I will now be focusing on obtaining annotated failures directly from Treeherder.
My latest commit makes SETA to stop using alertmanager for failures[*].
We now use TH's JobNote model.
We don't do 90 days but all job notes (which expires after 4 months).

The remaining milestones are:
* stop using Seta on Heroku's API
* add preseed support

jmaher: I've reduced the scope to everything needed to make TaskCluster work with SETA on Treeherder.
Could you please review the stretch goals and see if there's any that you believe is required before going live with this?
https://github.com/armenzg/treeherder/issues/1#issuecomment-263397421
I should have the PR ready for review early next week.
Also, this can be tested locally with the STR [1]

[*] I discovered that we still use more API of Heroku's SETA

[1]
STR:
git clone https://github.com/armenzg/treeherder.git
cd treeherder
git checkout seta_rewrite
vagrant up
vagrant ssh
./manage.py runserver
Open http://localhost:8000/api/project/mozilla-inbound/seta/v1/job-priorities/?build_system_type=taskcluster
Open http://localhost:8000/api/seta/v1/failures-fixed-by-commit/

On another tab:
vagrant ssh
./manage.py ingest_push mozilla-inbound  d70bb3fba851
./manage.py ingest_push mozilla-inbound  5a5d50943bbc
Open http://localhost:8000/#/jobs?repo=mozilla-inbound
Annotate few jobs with "fixed by commit" and the latest revision
./manage.py update_runnable_jobs
./manage.py analyze_failures
Flags: needinfo?(jmaher)
I think the stretch goals are nice- obviously all of them would be good to have, and I don't think we should stop working on SETA until that is the case.  A few things I care about are the unittests- what coverage do we have?

Also, what is the strategy for dealing with fixed_by_commit='' ?
Flags: needinfo?(jmaher)
I'm ignoring blank fixed_by_commit since I _assume_ it polutes the results.

I don't know the right syntax to make the treeherder/etl/seta.py be include but exclude all other files in that directory, thus, running the report separately [1] from the main one [2].

50% coverage for everything under treeherder/seta and 90% for treeherder/etl/seta.py

[1]
vagrant ~/treeherder $ coverage run --source=treeherder/etl -m py.test tests/seta
...
treeherder/etl/seta.py                                           97      8     18      4    90%

[2]
vagrant ~/treeherder $ coverage run --source=treeherder/seta -m py.test tests/seta
vagrant ~/treeherder $ coverage report
Name                                                          Stmts   Miss Branch BrPart  Cover
-----------------------------------------------------------------------------------------------
treeherder/seta/__init__.py                                       0      0      0      0   100%
treeherder/seta/analyze_failures.py                              38     12     10      1    65%
treeherder/seta/common.py                                         9      0      2      0   100%
treeherder/seta/job_priorities.py                                71     11     28      7    78%
treeherder/seta/management/__init__.py                            0      0      0      0   100%
treeherder/seta/management/commands/__init__.py                   0      0      0      0   100%
treeherder/seta/management/commands/analyze_failures.py          11     11      0      0     0%
treeherder/seta/management/commands/update_runnable_jobs.py       8      8      0      0     0%
treeherder/seta/models.py                                        60     16     16      1    67%
treeherder/seta/runnable_jobs.py                                 28     15      4      0    41%
treeherder/seta/seta.py                                         119    105     52      0     8%
treeherder/seta/tasks.py                                          4      4      0      0     0%
treeherder/seta/update_job_priority.py                           75     10     22      3    85%
-----------------------------------------------------------------------------------------------
TOTAL                                                           423    192    134     12    50%
one way to validate this is looking at the p1 tasks compared to heroku- we could adjust it to support 4 months of data (or hack treeherder seta to use 90 days) and compare the output- ideally it would be identical.

The coverage data is looking great- I think our goal was 60% coverage.
I've fixed a small regression and I've fixed the last call to any other SETA/ouija server (AFAICT).

I'm now moving on to the preseed support.
Attached file Commit for preseed support (obsolete) —
jmaher: I've landed preseed support, however, I have difficult understanding some of the intentions of the expiration_date field.

Why do we set it a year from now?
Why do we clear the expiration a year later?
Thinking of the talos rules, what do we gain by setting an expiration date or not?

In general, what is the meaning for the values expiration date can have?
A) a date is set
   - analyze failures will _not_ change the values of priority/timeout
B) a date is not set (null)
   - analyze failures could change the values of priority/timeout

Should we have a value for "never expire this job priority"? (e.g. set expiration_date to '*' or 'forever'.

On a wider note, I don't think we need this preseed support. We could get away with allowing you to login and insert new job priorities or modify existing ones.
Attachment #8820092 - Flags: feedback?(jmaher)
preseed.json is a hack, but the concept serves a purpose to allow us to override things temporarily or longer term.

I am glad you brought up the question about setting the expiration for one year- that is something which was added in trying to support taskcluster and while it will work, there are probably many issues- anything that has an expiration of one year should be never expires.

Instead of throwing away preseed support right now, I would prefer that we propose a better solution using in-tree configs or some other in-tree file- and I see that changing a lot in the world of per manifest reporting and scheduling.
Comment on attachment 8820092 [details]
Commit for preseed support

this looks sane.  One thing to consider on changing the date for preseed no expiration is that the jobpriorities field might be datetime and we could have restrictions on the data we put in there or odd behaviors of the default value we pull for null.
Attachment #8820092 - Flags: feedback?(jmaher) → feedback+
I'm now using the 2100 as the expiration date.
Comment on attachment 8804853 [details] [review]
[treeherder] armenzg:seta_rewrite > mozilla:master

This is ready to be reviewed.
Attachment #8804853 - Flags: review?(wlachance)
Attachment #8804853 - Flags: review?(jmaher)
Attachment #8804853 - Flags: review-
Comment on attachment 8804853 [details] [review]
[treeherder] armenzg:seta_rewrite > mozilla:master

I think this is really close- I would like to address the comments left on the PR before making this an r+.

Great job on getting this into a reviewable state and so complete!
Attachment #8804853 - Flags: review?(jmaher) → review-
Thanks jmaher! I will look into it.

wlach: I addressed what we discussed in my last commit https://github.com/mozilla/treeherder/pull/1957/commits/f7c1bcc55657c80fc6380206eb106a3cd547291b
Comment on attachment 8804853 [details] [review]
[treeherder] armenzg:seta_rewrite > mozilla:master

A few small things I would like changed but didn't see any huge red flags (though I confess I don't understand seta all that well). I'd say we're pretty close to being able to merge this and seeing how it works. Please re-r? when you think you have something ready to go.
Attachment #8804853 - Flags: review?(wlachance) → review-
Blocks: 1286358
I've addressed most of jmaher's feedback.

I will be tackling wlach's feedback soon.

Current coverage is at 64%.




vagrant ~/treeherder $ coverage run --source=treeherder/seta,treeherder/etl/seta.py -m py.test tests/seta && coverage html -i && coverage report -i
============================================================================================================ test session starts =============================================================================================================
platform linux2 -- Python 2.7.12, pytest-3.0.3, py-1.4.31, pluggy-0.4.0
Django settings: tests.settings (from ini file)
rootdir: /home/vagrant/treeherder, inifile: pytest.ini
plugins: django-3.0.0
collected 16 items

tests/seta/test_analyze_failures.py .
tests/seta/test_high_value_jobs.py .
tests/seta/test_job_priorities.py .
tests/seta/test_models.py .....
tests/seta/test_update_job_priority.py ........

========================================================================================================= 16 passed in 19.00 seconds =========================================================================================================
Name                                                          Stmts   Miss Branch BrPart  Cover
-----------------------------------------------------------------------------------------------
treeherder/etl/seta.py                                           57      1     18      1    97%
treeherder/seta/__init__.py                                       0      0      0      0   100%
treeherder/seta/analyze_failures.py                              32     10     10      1    64%
treeherder/seta/common.py                                         9      0      2      0   100%
treeherder/seta/high_value_jobs.py                               72     30     36      4    50%
treeherder/seta/job_priorities.py                                69      9     26      7    81%
treeherder/seta/management/__init__.py                            0      0      0      0   100%
treeherder/seta/management/commands/__init__.py                   0      0      0      0   100%
treeherder/seta/management/commands/analyze_failures.py          11     11      0      0     0%
treeherder/seta/management/commands/load_preseed.py               8      8      0      0     0%
treeherder/seta/management/commands/update_runnable_jobs.py       8      8      0      0     0%
treeherder/seta/models.py                                        60     16     16      1    67%
treeherder/seta/preseed.py                                       39     39     20      0     0%
treeherder/seta/runnable_jobs.py                                 33      6      6      2    74%
treeherder/seta/tasks.py                                          7      7      0      0     0%
treeherder/seta/update_job_priority.py                           76     10     22      3    85%
-----------------------------------------------------------------------------------------------
TOTAL                                                           481    155    156     19    64%
Comment on attachment 8804853 [details] [review]
[treeherder] armenzg:seta_rewrite > mozilla:master

I've addressed all raised concerns.

I've squashed older commits but left the ones since you had a look.

I need to look at a test failure on Travis and the fix should be a small delta.
Attachment #8804853 - Flags: review?(wlachance)
Attachment #8804853 - Flags: review?(jmaher)
Attachment #8804853 - Flags: review-
Comment on attachment 8804853 [details] [review]
[treeherder] armenzg:seta_rewrite > mozilla:master

Please let me know when travis is happy so I can test this on stage before we actually merge the PR. Congrats on getting this large chunk of work completed!
Attachment #8804853 - Flags: review?(wlachance) → review+
Thank you Will!
As I was suspecting we have an _intermittent_ orange. I've been suspecting of this for a while but I now have proof of it.
The fixture has 39643b5073cfb9473042884bfd3ced0289b3d7dd twice while the produced data has 773fa0d78b75d1d129646649ff6c5896c8e4edbf twice.

wlach: should I ignore this test failure?
https://travis-ci.org/mozilla/treeherder/jobs/186172233



(Pdb) pp(failures_fixed_by_commit)
{u'you look like a man-o-lantern': [(u'1824f3350e2152d3272de09c8ec9b103cd3ad667',
                                     u'debug',
                                     u'osx-10-6'),
                                    (u'2c057fb1cd2b6f8bd74121238d0287063d8f5562',
                                     u'opt',
                                     u'b2g-device-image'),
                                    (u'39643b5073cfb9473042884bfd3ced0289b3d7dd',
                                     u'debug',
                                     u'b2g-emu-jb'),
                                    (u'39643b5073cfb9473042884bfd3ced0289b3d7dd',
                                     u'debug',
                                     u'b2g-emu-jb'),
                                    (u'58cab069cf08211159774de948094dd963fb9d44',
                                     u'debug',
                                     u'osx-10-7'),
                                    (u'8d63b645c221e45ccf4a2fbc087fa0088535042c',
                                     u'debug',
                                     u'windowsxp'),
                                    (u'93e11a88ea92cfa4d930072b22afca002c53f249',
                                     u'opt',
                                     u'b2g-device-image'),
                                    (u'df822a71f7727bf3cb6d3b2aeba970d8a6d33ea1',
                                     u'opt',
                                     u'b2g-emu-jb'),
                                    (u'f2e06a487b304c16843fe53782a5888e5582be6b',
                                     u'debug',
                                     u'windows7-32'),
                                    (u'fe762310e2f921b93df828ac68ee958f19e2d759',
                                     u'debug',
                                     u'windows8-32')]}
(Pdb) pp(data)
{u'you look like a man-o-lantern': [(u'1824f3350e2152d3272de09c8ec9b103cd3ad667',
                                     u'debug',
                                     u'osx-10-6'),
                                    (u'2c057fb1cd2b6f8bd74121238d0287063d8f5562',
                                     u'opt',
                                     u'b2g-device-image'),
                                    (u'58cab069cf08211159774de948094dd963fb9d44',
                                     u'debug',
                                     u'osx-10-7'),
                                    (u'773fa0d78b75d1d129646649ff6c5896c8e4edbf',
                                     u'debug',
                                     u'b2g-emu-jb'),
                                    (u'773fa0d78b75d1d129646649ff6c5896c8e4edbf',
                                     u'debug',
                                     u'b2g-emu-jb'),
                                    (u'8d63b645c221e45ccf4a2fbc087fa0088535042c',
                                     u'debug',
                                     u'windowsxp'),
                                    (u'93e11a88ea92cfa4d930072b22afca002c53f249',
                                     u'opt',
                                     u'b2g-device-image'),
                                    (u'b386afd1afbc09b5f7f13af8607a65a87e105002',
                                     u'opt',
                                     u'b2g-emu-jb'),
                                    (u'f2e06a487b304c16843fe53782a5888e5582be6b',
                                     u'debug',
                                     u'windows7-32'),
                                    (u'fe762310e2f921b93df828ac68ee958f19e2d759',
                                     u'debug',
                                     u'windows8-32')]}
Comment on attachment 8804853 [details] [review]
[treeherder] armenzg:seta_rewrite > mozilla:master

great work Armen!
Attachment #8804853 - Flags: review?(jmaher) → review+
Attached file Commits view (obsolete) —
Hi Will, signatures were not being generated properly inside of the tests because in the sample data we were using 'buildername' instead of 'reference_data_name'. I describe this in the commit message.

I addressed the data fixture properly. No code changes. I don't know why the signatures were not bothering me more.
https://github.com/mozilla/treeherder/pull/1957/commits/95a4c849434b8182fb3b3199ede6428bf7f54ea9

Do you want me to fix this commit in a different PR? Or simply land 2 commits, one for SETA and one for the sample data fix?

I'm now going to look into a bunch of test failures that touching the data is causing :/
Attachment #8821647 - Flags: feedback?(wlachance)
Comment on attachment 8821647 [details]
Commits view

Hi Will,
This is ready to go to stage. All tests are green:
https://travis-ci.org/mozilla/treeherder/builds/187054914

I worked a bit more on the job_data.txt and adjusted the tests that depend on it.

The first commit shows everything you've already reviewed.
The second commit shows the sample data and test changes.
The third commit shows a fixture change. This commit should be folded with the first.

In other words, commit 1 & 3 would be squashed.
Attachment #8821647 - Attachment description: Fix sample data → Commits view
Attachment #8821647 - Flags: feedback?(wlachance) → review?(wlachance)
Commit pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/3a128e455dc2e301f77064b6ca02b431e1cdc6e7
Bug 1306709 - Fix sample data to make signatures be buildernames rather than a hash (#2044)

_load_jobs() from treeherder/model/derived/jobs.py will add a signature
hash to signatures instead of the buildername if we don't use
'reference_data_name' for the buildername.

This change does the following:
* Replace 'buildername' for 'reference_data_name'
* Replace 'b2g26_v1_2' for 'release'
Comment on attachment 8821647 [details]
Commits view

As discussed on irc, the fixture/test changes are fine, we've merged them in.

In the future, please don't attach github commits to bugs manually. Autolander will do that for you. :)
Attachment #8821647 - Flags: review?(wlachance)
Attachment #8808981 - Attachment is obsolete: true
Attachment #8821647 - Attachment is obsolete: true
What happened today:
* We merge to master the code which fixes TH's BB sample data
  * We rebased the branch and squash the small fixture fix
* We deployed the code to Heroku's treeherder-stage app
* We noticed a DoesNotExist error
  * wlach gave me write access to the treeherder repo and taught me how to deploy changes
  * It seems that job notes on production can have no associated job (very odd - bug 1326090)
  * A fix has been commited on the seta_rewrite branch
* jmaher noticed some issues in the priorities for the API

What's next?
* In January I will address the issue from jmaher & the comments from emorley
* Once these are addressed I don't think there will be anymore issues before merging to masters
* Look into enabling consuming this data from Buildbot and TaskCluster
* Smaller items listed in https://github.com/armenzg/treeherder/issues/1#issuecomment-263397421
Blocks: 1326102
Today I will be focusing on addressing emorley's concern.

I will address the issue raised by jmaher in a follow up PR as I'm more interested seeing this merged and running in production.
I want to polish the product for bugs that show up with real production data.
Comment on attachment 8804853 [details] [review]
[treeherder] armenzg:seta_rewrite > mozilla:master

2nd commit for emorley - addressing your feedback
3rd commit for wlach - adding back eleven jobs with notes fixture
Attachment #8804853 - Flags: review?(wlachance)
Attachment #8804853 - Flags: review?(emorley)
Attachment #8804853 - Flags: review+
Comment on attachment 8804853 [details] [review]
[treeherder] armenzg:seta_rewrite > mozilla:master

lgtm!
Attachment #8804853 - Flags: review?(wlachance) → review+
I would like to file a new bug with a new PR for any discovered issues on stage.
Unless I hear any objections I will be filing it sometime today.
I've created a Papertrail alert for treeherder-stage to let me know of any issues.
"((api AND seta) OR "seta ") AND (error OR warning OR traceback)"

I hope I got the syntax right and I will add the same in production if it works as expected.
Attached file Commit with feedback from emorley (obsolete) —
The branch is currently messed up. Here's the commit that I want emorley to look at.

I will fix "seta_rewrite" to contain the 3 commits I want merged (We will squash them before merging).

"seta_followup" contains all commits.
Attachment #8823435 - Flags: review?(emorley)
Attachment #8804853 - Flags: review?(emorley)
Comment on attachment 8823435 [details]
Commit with feedback from emorley

My apologies for the unnecesary noise. I've fixed the branch.
Attachment #8823435 - Attachment is obsolete: true
Attachment #8823435 - Flags: review?(emorley)
Attachment #8804853 - Flags: review?(emorley)
Comment on attachment 8804853 [details] [review]
[treeherder] armenzg:seta_rewrite > mozilla:master

Great work with this!

I've left some comments on the PR - the ones that I'd like to see addressed (or at least discussed if not changing):
* Why load_preseed runs every 24 hours and not just on deploy (https://github.com/mozilla/treeherder/pull/1957#discussion_r94605121)
* Using fetch_json rather than make_request (https://github.com/mozilla/treeherder/pull/1957#discussion_r94604223)
* Switching self-API calls to ORM usages to avoid regressing bug 1211414 (https://github.com/mozilla/treeherder/pull/1957#discussion_r94604390)
* Clarifying why behaviour varies according to User-Agent and possibly switching to an API parameter/different endpoint name instead (https://github.com/mozilla/treeherder/pull/1957#discussion_r94606474)
* Clarifying what happens to taskcluster/buildbot CI uptime if there are issues with accessing the SETA Treeherder API endpoint (https://github.com/mozilla/treeherder/pull/1957#discussion_r94606474)

Almost there! :-)
Attachment #8804853 - Flags: review?(emorley) → review-
I've filed a follow up bug 1328618.
I've moved most of the logic behind the runnable jobs API into treeherder/etl/runnable_jobs.py (formerly known as treeherder/etl/allthethings.py):
https://github.com/mozilla/treeherder/pull/1957/commits/6a3efb12293e3e98d8a2e96f2a6a9083a6337c64

Tomorrow I will spend some time to completely decouple it from SETA and address your remaining comments in the PR.
Comment on attachment 8804853 [details] [review]
[treeherder] armenzg:seta_rewrite > mozilla:master

I've addressed all feedback.

The following commits can be squashed with the main SETA commit:
* Load preseed.json on start up rather than on a schedule 
* Stop calling runnable jobs API but use list_runnable_jobs 
* The Gecko decision task should call the API with increase_counter

The other commits can land separatedely.

I can fix the commits when I get an r+.
Attachment #8804853 - Flags: review- → review?(emorley)
Comment on attachment 8804853 [details] [review]
[treeherder] armenzg:seta_rewrite > mozilla:master

Awesome! Thank you for your persistence :-)
Attachment #8804853 - Flags: review?(emorley) → review+
Commits pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/d40b042baea2a0fc96c9dcd865d10de08e94694e
Bug 1306709 - Move runnable jobs api logic into the runnable_jobs module

Until now, the only way to get runnable jobs information is by
querying the Treeherder runnable jobs api.

After this change, Treeherder modules won't need to call the API
but import the same function the API calls (list_runnable_jobs()).

https://github.com/mozilla/treeherder/commit/a62081f823deca9366c62d37a331e9910215aca6
Bug 1306709 -Job objects can now determine the platform option

We have few places that try to calculate the platform_option of a job.
This change adds a method to the Job class and
an OptionCollectionManager to help it.

https://github.com/mozilla/treeherder/commit/ed8e39221da464af01dadc238886da077fe9f9ca
Bug 1306709 - Add SETA to Treeherder
This has now been merged to master which means that treeherder-stage has the changes live (as long as anyone deploying their own branch is based off master).

I will be looking at treeherder-stage for valid results and follow up with any follow ups via bug 1328618.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Depends on: 1330015
Depends on: 1335151
Component: Treeherder → Treeherder: SETA
Product: Tree Management → Tree Management Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: