TBPL should report on individual suite results.

RESOLVED INCOMPLETE

Status

RESOLVED INCOMPLETE
8 years ago
4 years ago

People

(Reporter: catlee, Assigned: Swatinem)

Tracking

Trunk
x86_64
Linux

Details

Attachments

(3 attachments, 3 obsolete attachments)

(Reporter)

Description

8 years ago
In suites like mochitest-other, several individual suites are run together.  They're currently all lumped together under M(oth).

It would be nice to break each suite out into its own box somehow.

This is also blocking us merging the crashtest and jsreftest suites together, since developers want the results reported separately.
My thought was to report it something like we currently do with the split mochitests. Right now we have M(1 2 3 4 5 oth), so it's not much of a stretch to split that out to M(1 2 3 4 5 B C A P) (browser, chrome, a11y, plugins).

We could similarly group the 3 reftest suites, as R(R C J).
(Assignee)

Comment 2

8 years ago
Grouping the reftests should be straightforward. Splitting the results however would require a complete rework of our “run” concept in tbpl.
(Assignee)

Comment 3

8 years ago
Created attachment 474653 [details] [diff] [review]
group reftest suites

As suggested by comment 1, this groups the reftest suites, alphabetically.
Assignee: nobody → arpad.borsos
Status: NEW → ASSIGNED
Attachment #474653 - Flags: review?(mstange)
(Assignee)

Comment 4

8 years ago
(In reply to comment #3)
And it does only group C J R, not X. Maybe that should also be in the group?
(Assignee)

Comment 5

8 years ago
Created attachment 474655 [details] [diff] [review]
do not use logID as runID
Attachment #474655 - Flags: review?(mstange)
(Assignee)

Comment 6

8 years ago
Created attachment 474656 [details] [diff] [review]
do not use logID as runID

Was missing a change in AddCommentUI.
Attachment #474655 - Attachment is obsolete: true
Attachment #474656 - Flags: review?(mstange)
Attachment #474655 - Flags: review?(mstange)
Comment on attachment 474653 [details] [diff] [review]
group reftest suites

+    return machineType
missing semicolon

r=me, though this doesn't really have anything to do with the issue this bug was filed about, does it?
Attachment #474653 - Flags: review?(mstange) → review+
Comment on attachment 474656 [details] [diff] [review]
do not use logID as runID

diff --git a/js/TinderboxJSONUser.js b/js/TinderboxJSONUser.js
-  
-  getBuildScrape: function TinderboxJSONUser_getBuildScrape(td, machineRunID) {
-    return td.scrape[machineRunID];
-  },
-  

I know what you must be thinking... this was a much longer function when we were still parsing the Tinderbox HTML.

+      var runID = index++;

So this is preparation for using the buildbot build ID in the future? Great!
Attachment #474656 - Flags: review?(mstange) → review+
(Assignee)

Comment 9

8 years ago
Created attachment 474672 [details] [diff] [review]
split mochitest-other into individual tests

I’m not too happy about this patch, it introduces a lot of quirks and does not play very well with the legend and the machine type handling.
This also fixes an issue I noticed in the “group reftest suites” patch that had duplicated names in the title of links.

I also noticed that B for browser-chrome collides with the B for build.
The way stars work is also somehow messed up. You can star without problems, but the star only shows up on the individual test until you refresh, then the same star is shown on all the tests. This may also lead to accidental double starring.
It does however show you right away which one of the other tests is failing.
Attachment #474672 - Flags: feedback?(mstange)
(Assignee)

Comment 10

8 years ago
Created attachment 474673 [details] [diff] [review]
group reftest suites (fixed)
Attachment #474653 - Attachment is obsolete: true
Attachment #474673 - Flags: review+
(Assignee)

Comment 12

8 years ago
(In reply to comment #9)
> I also noticed that B for browser-chrome collides with the B for build.

And the C is ambiguous for crashtest or chrome.
And I'm betting that this is the reason that the summary for every orange build I've looked at today is "invalid id" making it impossible to star anything with tbpl.

Markus, if you're still awake, please undeploy this (or whatever else you might have pulled overnight, if it isn't actually this).
undeployed
(In reply to comment #4)
> (In reply to comment #3)
> And it does only group C J R, not X. Maybe that should also be in the group?

X is xpcshell, that's totally different.

(In reply to comment #2)
> Grouping the reftests should be straightforward. Splitting the results however
> would require a complete rework of our “run” concept in tbpl.

That's unfortunate, since splitting the mochitest suites was the real impetus of this bug. bug 586418 aims to reduce wasted setup/teardown time by combining some suites, but we'd like the reporting to be better.
(Assignee)

Comment 17

8 years ago
Created attachment 475088 [details] [diff] [review]
split mochitest-other into individual tests
Attachment #474672 - Attachment is obsolete: true
Attachment #475088 - Flags: feedback?(mstange)
Attachment #474672 - Flags: feedback?(mstange)
Comment on attachment 475088 [details] [diff] [review]
split mochitest-other into individual tests

I think this is the right approach. It obviously requires some more work to iron out the kinks you mentioned.
Attachment #475088 - Flags: feedback?(mstange) → feedback+
Comment on attachment 474656 [details] [diff] [review]
do not use logID as runID

I think this patch caused a pretty serious regression in TBPL, which is that if you have a build highlighted and TBPL reloads data from tinderbox, the build that's highlighted will jump to some other build.  We need to use a persistent ID here; using a counter that changes each reload doesn't work.
Attachment #474656 - Flags: review-
(Assignee)

Comment 20

8 years ago
http://hg.mozilla.org/users/mstange_themasta.com/tinderboxpushlog/rev/98535a5c1fd3 Backed out.

This will probably re-land once we don’t reload the whole list all over, instead just appending new runs.
Also, I'm skeptical that this is a good approach to take at this time.  It's going to break starring.  And, furthermore, TBPL isn't really where we should be trying to split up build logs, which is quite expensive.  I think splitting the runs into the appropriate units that we want to display belongs on the server side.
... I should probably explain "break starring" slightly more:  right now we store stars on tinderbox, which means that a star is associated with whatever tinderbox considers a build.  If we do this, we'll end up starring all of the split builds when we attempt to star one.

Now, at some point we should move starring off of tinderbox.  But there's still the issue that TPBL is slow enough already, and we're better off doing expensive splitting operations on the server side.
The immediate use case here is that in bug 586418, we want to combine crashtest+jsreftest into a single run without making the data display worse than it currently is. FWIW, for what you're describing, if we split the display of mochitest-other out, it's not going to be worse than it currently is.
(In reply to comment #23)
> The immediate use case here is that in bug 586418, we want to combine
> crashtest+jsreftest into a single run without making the data display worse
> than it currently is. FWIW, for what you're describing, if we split the display
> of mochitest-other out, it's not going to be worse than it currently is.

It will be worse, since we tend to follow the "don't star a build unless you've figured out everything that's wrong with it" rule.  With this change, people will be unable to follow that rule.
(Assignee)

Comment 25

8 years ago
I already mentioned the starring problem in comment 9. Moving it to the server side definetely leads to a much clearer client. But we are not quite there yet.
(In reply to comment #0)
> In suites like mochitest-other, several individual suites are run together. 
> They're currently all lumped together under M(oth).
> 
> It would be nice to break each suite out into its own box somehow.

What should we do with the logs? Should we make the server side log parsing look for separators between the individual log parts and break them up into separate logs?

Comment 27

8 years ago
(In reply to comment #22)
> ... I should probably explain "break starring" slightly more:  right now we
> store stars on tinderbox, which means that a star is associated with
> whatever tinderbox considers a build.  If we do this, we'll end up starring
> all of the split builds when we attempt to star one.
> 
> Now, at some point we should move starring off of tinderbox.  But there's
> still the issue that TPBL is slow enough already, and we're better off doing
> expensive splitting operations on the server side.

So, we've moved starring off of tinderbox now with the advent of Orange Factor.  Is there anything else blocking this bug from landing?
You've got your tenses mixed up: "we will have moved starring off of tinderbox in the future, so at that time will there be anything else?"

I think the answer is "that depends on whether the future buildbot-only version of tbpl knows how to split apart multiple arbitrary suites in a single log, by having buildbot always insert the (currently undefined) One True Separator between suites."

The other problem that I don't see quite explicitly brought up yet is that currently a log has a status, busted or testfailed (or, I've heard, success), which would need to be output for each suite rather than once per log. Quite often it's the result of the run's status rather than a suite's status, so I suspect releng will need to rework all the stuff that currently decides whether a testfailed in a suite turns the whole log to testfailed, or doesn't because an infra failure in another buildstep made it busted rather than just testfailed - some failures, like failing to download the build, make every test suite in the run busted, but others, like failing to unpack jsreftests in a combined run of J and X, would only bust part.

Comment 29

8 years ago
Where are we at in this bug?

This is currently blocking me from merging unit test jobs.

It seems that grouping or no grouping does not affect when dealing with talos runs (as I discovered it when adding tp5 to the tp4 run).
(In reply to comment #29)
> Where are we at in this bug?

I haven't done any work on this yet.

> This is currently blocking me from merging unit test jobs.

Which ones? Still crashtest and jsreftest, as stated in comment 0?

> It seems that grouping or no grouping does not affect when dealing with
> talos runs (as I discovered it when adding tp5 to the tp4 run).

Well, depends on what you mean. You didn't get separate "T"s for tp4 and tp5; they shared the same T. If you don't mind having the shared crashtest/jsreftest run to have a shared "CJ", for example, then we're already all set. But splitting up the "letters" is what this bug was originally about, and that's harder.

If we want split letters, we'll need to solve the problems philor mentioned in comment 28: We need a decision about the log separator (this is the easy part), and we need to teach the buildbot scripts that create the build json how to specify what parts a build consists of, along with a result code for each one.

This need to be done on the buildbot side before anything can be done on the TBPL side.

Updated

8 years ago
No longer blocks: 659328
I think our requirements have changed somewhat here - and any future use cases will need to be handled by treeherder anyway. Closing per IRC discussions with Swatinem :-)
Status: ASSIGNED → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → INCOMPLETE
Product: Webtools → Tree Management
Product: Tree Management → Tree Management Graveyard
You need to log in before you can comment on or make changes to this bug.