Closed Bug 616011 Opened 14 years ago Closed 10 years ago

Update E2E reports (branch/changeset) to account for gaps in scheduled status

Categories

(Release Engineering :: General, defect, P4)

x86
All
defect

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: lsblakk, Unassigned)

Details

(Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2477] [cruncher][reports])

Attachments

(2 files)

The current e2e report for build runs could feasibly show a revision as being complete if the builds are done but the tests are not all scheduled yet.

Schedulers keep track of the last change that they've seen in the state column in schedulers table that's a json dict and one of keys is last_processed. So the report needs to check two places: last_processed, then check the scheduler_changes to see if it's categorized (important/unimportant) where changes accumulate. If it's in scheduler_changes that means it's still pending, if the change number is greater than the last_processed.

This needs to be incorporated into how the report is generated so we know we've got the status of everything for a build run.

I'll be using the json output of this report for bug 430942 so that's why this issue arose.  The script that posts to a bug should be able to depend on this report to know that a build run is actually complete and not waiting for tests to be scheduled still.
I tried the following approach, catlee suggested:

step 1. Find min(last_processed) for schedulers of the branch the e2e report is displayed, and then 
step 2. find any changes > min(last_processed) on that branch, that ALSO match the revision you're looking at

However, step 1. gives back as the minimum across all schedulers for the branch a very low value, so low that most of the changes even if processed have the changeid higher.

For example, for mozilla-central, min(last_processed) is 3994 (for scheduler mozilla-central-linux-debug-unittest , id 51).

Next, if I ask for all changes with changeid > 3994, branch mozilla-central, and when_timestamp between starttime and endtime (of the e2e times report), the changeid-s are in the range: 197882 - 201707. 2 orders of magnitude larger.

Is there a way to know which subset of schedulers to check for the last_processed? Because looking at all in a branch doesn't help much.

Would a this different approach work: forget about last_processed, look at all changes in changes table that have not Build Requests created yet. If there are, the Build Run is not complete yet. wdyt?

catlee? lsblakk? 


Might help:
* list of all schedulers in schedulers table: http://dl.dropbox.com/u/1119078/mozilla/schedulers.html
Yes, 1) requires that you know which schedulers are "active"...for some definition of active.  some schedulers run once a day, or even once a week, so their "last processed" will be pretty small.  but we want to ignore those for this problem.

I think finding all changes that don't have build requests created yet will work most of the time.  There are some exceptions, like changes with DONTBUILD in the comments.
This patch contains the solution in which I list all changes in the timeframe (for e2e report and by revision for Build Run report) and see which changes don't have build requests yet.

I had to modify the BuildRequestsQuery, to allow the retrieval of multiple rows for a build request, one for each changeid (this is done only when calling with a special parameter changeid_all=True, the default behavious is still the same as before: one row per build request (or more specific, per build).

I created a new function GetBuildRequests, which executes BuildRequestsQuery, and groups the rows in BuildRequest objects, and saves all changeids for a build request within the object.

I added several new fields to BuildRun, BuildRequest, EndtoEndTimesReport, such that when looking at the reports, you have the following new info displayed:
- End to End Times Report:
  * for each build run in the table, the Complete column takes into account if changes with no build requests exist (i'll call them pending changes from now on)
  * inserted a new column called Pending Changes (in the far right of the table) which lists the pending changes (changeid, branch, revision) found for each build run
  * Summary small table at the top of the report with all pending changes found in time timeframe (chandeid, revision, branch and when_timestamp).
  * BONUS: Authors column (with all authors in a build run) and Changes Revisions (all changes.revision values grouped under the same sourcestamps.revision value)

- Build Run Report:
  * in the summary added fields: Is Complete?, Changes Revisions, Authors and Pending Changes (with the same meaning as described above)
  * build requests in the table, I added columns: changes revisions, authors, build id (bid) and changeids.
Attachment #496844 - Flags: review?(catlee)
Attachment #496844 - Flags: checked-in?
Attachment #496844 - Flags: review?(catlee) → review+
Attachment #496844 - Flags: checked-in? → checked-in+
I think this is causing issues in the try bugposting script:

https://build.mozilla.org/buildapi/reports/revision/try/9733bd66818e was found by my script to be complete on a run at 6pm on Feb 8:

2011-02-08 18:01:11,647 - try_bugposter - main - Preparing to post results to bugzilla for complete builds
2011-02-08 18:01:11,665 - try_bugposter - PostBugComment - Attempting to post comment in bug: 629593
2011-02-08 18:01:11,665 - try_bugposter - PostBugComment - COMMENT:   Try run for 9733bd66818e had 1 build requests and 1 completed. The results are:
      success: 1
          * ['WINNT 5.2 tryserver build']
      warnings: 0
          * []
      failed: 0
          * []
  
But now it is set to is_complete: no and has pending changes attached to it.
To be clear, what is happening seems to be that a revision is marked complete but that pending changes flip that setting back to incomplete.
Here's another example of a flip-flop on is_complete: http://cruncher.build.mozilla.org/buildapi/reports/revision/try/473137428a64 which at the time of posting this has O:5 in the results since 5 pending changes have been tacked on to it.
There are 2 distinct problems with the build runs staying incomplete, one I fixed:

1/ there was a bug with checking the pending changes, which I fixed (pending changes were not properly checked to not have build requests created), e.g. see:
http://cruncher.build.mozilla.org/~anamarias/wsgi/reports/revision/try/9733bd66818e which now is complete

2/ Another more recent problem: each of the other build runs truly have 2 pending changes: one tryserver-linuxqt-opt-unittest and try-android-talos (and no pending build requests, no running build requests), e.g.

http://cruncher.build.mozilla.org/~anamarias/wsgi/reports/revision/try/afa1d1ecd9e5
• 329465(tryserver-linuxqt-opt-unittest,afa1d1ecd9e5), 02/06/11 08:47:01
• 329548(try-android-talos,afa1d1ecd9e5), 02/06/11 10:56:47 

The patch attached fixes problem 1/
Attachment #511822 - Flags: review?(catlee)
Attachment #511822 - Flags: checked-in?
Attachment #511822 - Flags: review?(catlee)
Attachment #511822 - Flags: review+
Attachment #511822 - Flags: checked-in?
Attachment #511822 - Flags: checked-in+
I can't load reports/endtoend/try or mozilla-central right now, which might be a regression from this landing today. Or cruncher might still be intermittently flaky. Please investigate.
Removing dependency now for the try results tools since those will poll the db instead of relying on the json reports.
No longer blocks: 430942
Assignee: astoica → catlee
Priority: P2 → P4
Assignee: catlee → nobody
Product: mozilla.org → Release Engineering
Found in triage.
Component: Other → Tools
Whiteboard: [cruncher][reports] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2477] [cruncher][reports]
Do we even look at the end2end reports here anymore?
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WONTFIX
Component: Tools → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: