Closed Bug 1578542 Opened 5 years ago Closed 4 years ago

Raptor tests with firefox crashes are not failing

Categories

(Testing :: Raptor, defect, P2)

Version 3
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: sparky, Assigned: stephend)

Details

(Keywords: perf-alert)

Attachments

(2 files)

I was hitting a shutdown crash locally that required me to add the {"browser.sessionstore.resume_from_crash": false} pref and I thought it was quite curious that we aren't seeing these errors in raptor. From the looks of it, our tests don't always fail when a crash occurs. The task selected here has a crash (note the .dmp file in job details) but it didn't fail: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&tier=1%2C2%2C3&searchStr=raptor&selectedJob=264773614

These crashes are most likely unrelated to the pref requirement I needed locally. The perfherder data for the other subtests should still be good, but I think it would be better if we fail the task at the end if one of the subtests failed to complete.

Assignee: nobody → stephen.donner
Status: NEW → ASSIGNED

I think this is not only the right approach, but also the right level (base/common/perf/raptor, for the various user.js places to override). A couple of suites set this pref in the same place as my patch (JS and web-platform tests): https://searchfox.org/mozilla-central/search?q=user_pref(%22browser.sessionstore.resume_from_crash%22%2C+false)%3B&case=false&regexp=false&path=

A whole host of others have it set, too: https://searchfox.org/mozilla-central/search?q=browser.sessionstore.resume_from_crash&path=

Pushed by sdonner@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/85971c648a8e
Raptor tests with Firefox crashes are not failing. r=perftest-reviewers,sparky

Adding a leave-open since there may be other things we can do to help this issue, and in case we find more quiet crashes that don't result in failures.

Keywords: leave-open

== Change summary for alert #23218 (as of Wed, 25 Sep 2019 07:43:19 GMT) ==

Improvements:

3% raptor-tp6-microsoft-firefox fcp linux64-shippable opt 230.48 -> 224.17

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=23218

:stephend Should we expect performance improvements from this bug?

Flags: needinfo?(stephen.donner)

(In reply to Florin Strugariu [:Bebe] (needinfo me) from comment #7)

== Change summary for alert #23218 (as of Wed, 25 Sep 2019 07:43:19 GMT) ==

Improvements:

3% raptor-tp6-microsoft-firefox fcp linux64-shippable opt 230.48 -> 224.17

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=23218

:stephend Should we expect performance improvements from this bug?

Doesn't surprise me, now that you ask, but :sparky probably has better insight, or the ability to tell, too. I'd think that not having to spend time to enter Session Restore, etc. -- and then, a short while later tear down, and restart the browser -- would save us time, yeah?

Flags: needinfo?(stephen.donner) → needinfo?(gmierz2)

I can confirm that this is the bug that caused the improvment

Updated the alert to pinpoint the correct commit

== Change summary for alert #23238 (as of Thu, 26 Sep 2019 10:32:05 GMT) ==

Improvements:

3% raptor-tp6-microsoft-firefox linux64-shippable opt 235.69 -> 228.53
3% raptor-tp6-microsoft-firefox fcp linux64-shippable opt 230.48 -> 224.17
3% raptor-tp6-microsoft-firefox fcp linux64-shippable opt 228.71 -> 222.75

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=23238

We also got a regression:
== Change summary for alert #23238 (as of Thu, 26 Sep 2019 10:32:05 GMT) ==

Regressions:

2% raptor-tp6-facebook-firefox-cold windows10-64-shippable opt 419.29 -> 427.37

I was quite surprised by these, I didn't think sessionstore would affect page-load times - but as :stephend and :jesup have mentioned, it is somewhat expected. There is some work on sessionstore in fission, which should help with the regression (I don't know what bug it is, but if anyone does it would be great to reference it here).

I spent some time looking into the code for crashes in raptor and found that RaptorDesktop never calls self.check_for_crashes in self.run_test_teardown so this would be the next thing we should try here: https://dxr.mozilla.org/mozilla-central/source/testing/raptor/raptor/raptor.py#839

EDIT: Nevermind, found the call here: https://dxr.mozilla.org/mozilla-central/source/testing/raptor/raptor/raptor.py#218

Flags: needinfo?(gmierz2)

The big problem here is that the return code in raptor-main is highly dependent on the return code from process_results which has to catch all the issues. I think it would be better if we eventually add a self.return_code property to the PerfTest class and then return that in os.sys.exit(...) instead of relying on the results and ouput handlers to catch the failures.

I'll be uploading a patch soon that will solve the current issue we are having which is that when a crash or application timeout occurs, and there are no results for a certain subtest in the results handler, it does not report an error when it should.

This patch adds the ability to check if all subtests were run and ensures that we fail the task when at least one test is missing from the expected set of tests. It also modifies how the return code is computed in output.output to check that all checks passed and that none of them failed.

Pushed by gmierz2@outlook.com:
https://hg.mozilla.org/integration/autoland/rev/be910f652c9d
Fail the tests when a subtest is missing from the results. r=perftest-reviewers,stephendonner,rwood

Hey :sparky, I see a couple of patches have landed (awesome) can this bug be closed now? Or are there other areas where crashes need to be captured? Thanks!

Flags: needinfo?(gmierz2)

I think we should leave this open for now, I imagine that there's at least one other area that we haven't fixed yet. The ideal solution would be to solve bug 1587500 and set the return code anytime a crash is detected by mozcrash.

Flags: needinfo?(gmierz2)
Priority: P1 → P2

Resolving as fixed since most of the issues are solved. We can open follow-up bugs for additional work.

Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Keywords: perf-alert
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: