Closed Bug 1163088 Opened 9 years ago Closed 9 years ago

[raptor] run gaia raptor suite 6x right off, and use median results

Categories

(Firefox OS Graveyard :: Gaia::PerformanceTest, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rwood, Assigned: rwood)

References

Details

Attachments

(3 files)

When the raptor emulator tests are running and reporting to treeherder, we want to be sure a flagged regression is a valid regression. Running the test once on the reference build and once on the tester build, comparing the results (and having an apparent regression appear) is not always a definitive result due to environmental factors, etc.

In order to make the results more reliable, implement raptor suite repeats. Run the suite once (launch test of 30 runs, on reference build and on tester build). If the results indicate no regression, finish and mark the result task green.

If the first suite results indicate a possible regression, then repeat the suite 6 more times concurrently (on the same builds). Then calculate the median results for the reference build and test build from all six runs, and compare. If this result indicates a regression, flag the result task as red, otherwise green. This way an indicated regression is well proven before being flagged (thanks for the input :jmaher).
As discussed with James, instead of waiting for the first suite to finish and then kicking off more runs if a regression is detected, just launch the suite 6x right from the start. Use the median results to check for a performance regression. This way, since each suite runs concurrently, the testing time won't be extended, but yet we will be confident the regression is valid if flagged.
Summary: [raptor] repeat raptor suite several times if a regression is indicated → [raptor] run gaia raptor suite 6x right off, and use median results
Update raptor-results docker image code to:
- receive results for a variable number of suite runs (variable number of base and patch taskIds
- use the median result in each set (base and patch results) to check for the performance regression
Attachment #8607728 - Flags: review?(eperelman)
Comment on attachment 8607728 [details] [review]
https://github.com/rwood-moz/raptor-results/pull/2

r+ with comments addressed.
Attachment #8607728 - Flags: review?(eperelman) → review+
Thanks Eli. The raptor-results docker image portion of this bug has been merged:

https://github.com/rwood-moz/raptor-results/commit/e8191e782cc24160e1456267b535177285280403
Raptor-results image, didn't commit the post-review updates; did that now:

https://github.com/rwood-moz/raptor-results/commit/12a286f039c85417368e7df226269ecc812c5ca7
Taskcluter raptor decision task code, update to generate appropriate raptor tasks for N number of suite iterations
Attachment #8608826 - Flags: review?(garndt)
Blocks: 1166776
Attachment #8608826 - Flags: review?(garndt) → review+
Keywords: checkin-needed
Landed myself, don't know why autolander doesn't seem to work for my raptor patches.

https://github.com/mozilla-b2g/gaia/commit/ea07a22e99979fbce2df447df73caf76c351909e
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: