Closed Bug 1600193 Opened 4 years ago Closed 4 years ago

Some android-hw raptor tasks hang with repeated "[raptor-runnerjs] checking results..." in logcat

Categories

(Testing :: Raptor, defect, P3)

Version 3
defect

Tracking

(firefox76 fixed)

RESOLVED FIXED
mozilla76
Tracking Status
firefox76 --- fixed

People

(Reporter: gbrown, Assigned: sparky)

References

Details

Attachments

(1 file)

In bug 1587207, we have noticed many android-hw task timeouts due to hangs. Logs are not consistent and there may be multiple causes of the hangs.

A subset of those cases are certain raptor tasks which all have many minutes of "[raptor-runnerjs] checking results..." logging at the end of the logcat.

https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=277867767&repo=autoland&lineNumber=1428
https://firefoxci.taskcluster-artifacts.net/Cf3sCf8bQtKRT6u3pf1ylw/0/public/test_info/logcat-ZY322LH8PW.log

11-24 06:02:01.461 11172 11172 I GeckoViewActivity: History state updated
11-24 06:02:01.505 11172 11172 D GeckoSession: handleMessage GeckoView:StateUpdated uri=null
11-24 06:02:01.506 11172 11172 I GeckoViewActivity: New Session state: {"scrolldata":{"zoom":{"resolution":0.3673469126224518,"displaySize":{"height":1536,"width":1080}}},"history":{"entries":[{"subframe":true,"persist":true,"cacheKey":0,"ID":1,"url":"http:\/\/127.0.0.1:39265\/Speedometer\/index.html?raptor","title":"Speedometer 1.0","docIdentifier":2147483650,"storagePrincipalToInherit_base64":"eyIxIjp7IjAiOiJtb3otZXh0ZW5zaW9uOi8vOWVjZmQ2MDUtYmE4MC00YzlmLWJlN2UtNzhhYWFhMDkyZmY1LyJ9fQ==","triggeringPrincipal_base64":"eyIxIjp7IjAiOiJtb3otZXh0ZW5zaW9uOi8vOWVjZmQ2MDUtYmE4MC00YzlmLWJlN2UtNzhhYWFhMDkyZmY1LyJ9fQ==","principalToInherit_base64":"eyIxIjp7IjAiOiJtb3otZXh0ZW5zaW9uOi8vOWVjZmQ2MDUtYmE4MC00YzlmLWJlN2UtNzhhYWFhMDkyZmY1LyJ9fQ==","resultPrincipalURI":null,"originalURI":"http:\/\/127.0.0.1:39265\/Speedometer\/index.html?raptor","docshellUUID":"{efa6d2d5-6220-4799-8c26-ce748f499a9b}"}],"requestedIndex":0,"fromIdx":-1,"index":1,"userContextId":0}}
11-24 06:02:01.617 11172 11187 I Gecko   : console.log: "[raptor-runnerjs] checking results..."
11-24 06:02:01.618 11172 11187 I GeckoConsole: [raptor-runnerjs] checking results...
11-24 06:02:01.869 11172 11187 I Gecko   : console.log: "[raptor-runnerjs] checking results..."
11-24 06:02:01.869 11172 11187 I GeckoConsole: [raptor-runnerjs] checking results...
11-24 06:02:02.122 11172 11187 I Gecko   : console.log: "[raptor-runnerjs] checking results..."
11-24 06:02:02.122 11172 11187 I GeckoConsole: [raptor-runnerjs] checking results...
11-24 06:02:02.373 11172 11187 I Gecko   : console.log: "[raptor-runnerjs] checking results..."
11-24 06:02:02.374 11172 11187 I GeckoConsole: [raptor-runnerjs] checking results...
11-24 06:02:02.625 11172 11187 I Gecko   : console.log: "[raptor-runnerjs] checking results..."
11-24 06:02:02.626 11172 11187 I GeckoConsole: [raptor-runnerjs] checking results...
11-24 06:02:02.879 11172 11187 I Gecko   : console.log: "[raptor-runnerjs] checking results..."
11-24 06:02:02.880 11172 11187 I GeckoConsole: [raptor-runnerjs] checking results...
11-24 06:02:03.130 11172 11187 I Gecko   : console.log: "[raptor-runnerjs] checking results..."
11-24 06:02:03.131 11172 11187 I GeckoConsole: [raptor-runnerjs] checking results...
11-24 06:02:03.382 11172 11187 I Gecko   : console.log: "[raptor-runnerjs] checking results..."
11-24 06:02:03.382 11172 11187 I GeckoConsole: [raptor-runnerjs] checking results...
11-24 06:02:03.633 11172 11187 I Gecko   : console.log: "[raptor-runnerjs] checking results..."
11-24 06:02:03.634 11172 11187 I GeckoConsole: [raptor-runnerjs] checking results...
11-24 06:02:03.884 11172 11187 I Gecko   : console.log: "[raptor-runnerjs] checking results..."
11-24 06:02:03.885 11172 11187 I GeckoConsole: [raptor-runnerjs] checking results...
11-24 06:02:04.136 11172 11187 I Gecko   : console.log: "[raptor-runnerjs] checking results..."
11-24 06:02:04.136 11172 11187 I GeckoConsole: [raptor-runnerjs] checking results...
11-24 06:02:04.161 11172 11187 D GeckoScreenOrientation: unlocking
11-24 06:02:04.161 11172 11187 D GeckoScreenOrientation: unlocking
11-24 06:02:04.387 11172 11187 I Gecko   : console.log: "[raptor-runnerjs] checking results..."
11-24 06:02:04.388 11172 11187 I GeckoConsole: [raptor-runnerjs] checking results...
11-24 06:02:04.639 11172 11187 I Gecko   : console.log: "[raptor-runnerjs] checking results..."
...
11-24 06:12:08.960 11172 11187 I GeckoConsole: [raptor-runnerjs] checking results...
11-24 06:12:09.212 11172 11187 I Gecko   : console.log: "[raptor-runnerjs] checking results..."
11-24 06:12:09.212 11172 11187 I GeckoConsole: [raptor-runnerjs] checking results...
11-24 06:12:09.464 11172 11187 I Gecko   : console.log: "[raptor-runnerjs] checking results..."
11-24 06:12:09.464 11172 11187 I GeckoConsole: [raptor-runnerjs] checking results...

(In reply to Geoff Brown [:gbrown] from comment #0)

https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=277867767&repo=autoland&lineNumber=1428
https://firefoxci.taskcluster-artifacts.net/Cf3sCf8bQtKRT6u3pf1ylw/0/public/test_info/logcat-ZY322LH8PW.log

That logcat also has:

11-24 05:59:46.420 11172 11187 I Gecko   : console.log: "[raptor-runnerjs] now is 1574575186414, set raptor alarm raptor-page-timeout to expire at 1574576086414"

I make that out as 900 seconds or 15 minutes. But the task times out before 15 minutes has expired...is that timeout too long?

Priority: -- → P2

Note that we may miss to send some messages to the control server, and that we wait forever. I found this today and will fix it via bug 1617909. Lets hope that it is the underlying issue.

Depends on: 1617909
Priority: P2 → P3

Geoff, can you please check if that is still happening now with bug 1617909 fixed some days ago? Thanks.

Flags: needinfo?(gbrown)

For speedometer tests I see:

03-02 11:41:22.630  8317  8317 I GeckoViewActivity: Visited URL: http://127.0.0.1:41089/Speedometer/resources/todomvc/vanilla-examples/es2015-babel-webpack/dist/index.html
03-02 11:41:22.643  8317  8332 E GeckoConsole: [JavaScript Error: "uncaught exception: undefined"]
03-02 11:41:22.643  8317  8332 E GeckoConsole: [JavaScript Error: "uncaught exception: undefined"]
03-02 11:41:22.643  8317  8332 E GeckoConsole: [JavaScript Error: "uncaught exception: undefined"]

Most likely those are related to loading the test page, and the test runner not correctly running and finishing the test. For unity-webgl we have a similar thing. See bug 1524545.

Depends on: 1524545

So Raptor sets a raptor-page-timeout timer via the WebExtension API, which basically should hard-stop a long running test to not cause such a task aborted failure. In the logcat I can see that the timer has been set, but it never breaks out the wait for results loop. We will have to check why that happens.

Geoff, how often do we see those failures? Do you report any of those, or are those just some examples from many others? I would like to know how important it is to get this fixed. Thanks.

Flags: needinfo?(gbrown)

I see this type of failure reported in bug 1589796 (when the task times out). I see about 13 such failures there in the last week. I think that's significant/bad, but not terrible.

Flags: needinfo?(gbrown)

I think if the raptor-page-timeout was less, or the task max-run-time was larger, we would see page timeouts rather than task timeouts - that might be a step in the right direction.

I agree with :gbrown, our page timeout is 15mins right now for this test:

03-23 12:04:51.105  7478  7494 I GeckoConsole: [raptor-runnerjs] getting test settings from control server
03-23 12:04:51.119  7478  7494 I Gecko   : console.info: "[raptor-runnerjs] test settings received: {\"raptor-options\": {\"expected_browser_cycles\": 1, \"subtest_unit\": \"score\", \"alert_threshold\": 2.0, \"type\": \"benchmark\", \"page_cycles\": 1, \"subtest_lower_is_better\": false, \"test_url\": \"http://<host>:<port>/unity-webgl/index.html?raptor\", \"page_timeout\": 900000, \"host\": \"127.0.0.1\", \"cold\": false, \"lower_is_better\": false, \"unit\": \"score\"}}"
03-23 12:04:51.120  7478  7494 I GeckoConsole: [raptor-runnerjs] test settings received: {"raptor-options": {"expected_browser_cycles": 1, "subtest_unit": "score", "alert_threshold": 2.0, "type": "benchmark", "page_cycles": 1, "subtest_lower_is_better": false, "test_url": "http://<host>:<port>/unity-webgl/index.html?raptor", "page_timeout": 900000, "host": "127.0.0.1", "cold": false, "lower_is_better": false, "unit": "score"}}
03-23 12:04:51.121  7478  7494 I Gecko   : console.info: "[raptor-runnerjs] test URL: http://127.0.0.1:42405/unity-webgl/index.html?raptor"
03-23 12:04:51.121  7478  7494 I GeckoConsole: [raptor-runnerjs] test URL: http://127.0.0.1:42405/unity-webgl/index.html?raptor
03-23 12:04:51.121  7478  7494 I Gecko   : console.info: "[raptor-runnerjs] using page timeout: 900000ms"
03-23 12:04:51.121  7478  7494 I GeckoConsole: [raptor-runnerjs] using page timeout: 900000ms

Looking at the logs for these tests, if we timeout at 7 minutes we should be fine. I'll make a patch for this.

This patch reduces the page load timeout for speedometer and unity-webgl tests to 7 minutes.

Assignee: nobody → gmierz2
Status: NEW → ASSIGNED
No longer depends on: 1524545, 1614275, 1617909
Pushed by hskupin@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/fff410e32a31
Reduce page load timeouts for some raptor benchmark tests. r=whimboo,perftest-reviewers,AlexandruIonescu
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla76
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: