Some android-hw raptor tasks hang with repeated "[raptor-runnerjs] checking results..." in logcat
Categories
(Testing :: Raptor, defect, P3)
Tracking
(firefox76 fixed)
Tracking | Status | |
---|---|---|
firefox76 | --- | fixed |
People
(Reporter: gbrown, Assigned: sparky)
References
Details
Attachments
(1 file)
In bug 1587207, we have noticed many android-hw task timeouts due to hangs. Logs are not consistent and there may be multiple causes of the hangs.
A subset of those cases are certain raptor tasks which all have many minutes of "[raptor-runnerjs] checking results..." logging at the end of the logcat.
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=277867767&repo=autoland&lineNumber=1428
https://firefoxci.taskcluster-artifacts.net/Cf3sCf8bQtKRT6u3pf1ylw/0/public/test_info/logcat-ZY322LH8PW.log
11-24 06:02:01.461 11172 11172 I GeckoViewActivity: History state updated
11-24 06:02:01.505 11172 11172 D GeckoSession: handleMessage GeckoView:StateUpdated uri=null
11-24 06:02:01.506 11172 11172 I GeckoViewActivity: New Session state: {"scrolldata":{"zoom":{"resolution":0.3673469126224518,"displaySize":{"height":1536,"width":1080}}},"history":{"entries":[{"subframe":true,"persist":true,"cacheKey":0,"ID":1,"url":"http:\/\/127.0.0.1:39265\/Speedometer\/index.html?raptor","title":"Speedometer 1.0","docIdentifier":2147483650,"storagePrincipalToInherit_base64":"eyIxIjp7IjAiOiJtb3otZXh0ZW5zaW9uOi8vOWVjZmQ2MDUtYmE4MC00YzlmLWJlN2UtNzhhYWFhMDkyZmY1LyJ9fQ==","triggeringPrincipal_base64":"eyIxIjp7IjAiOiJtb3otZXh0ZW5zaW9uOi8vOWVjZmQ2MDUtYmE4MC00YzlmLWJlN2UtNzhhYWFhMDkyZmY1LyJ9fQ==","principalToInherit_base64":"eyIxIjp7IjAiOiJtb3otZXh0ZW5zaW9uOi8vOWVjZmQ2MDUtYmE4MC00YzlmLWJlN2UtNzhhYWFhMDkyZmY1LyJ9fQ==","resultPrincipalURI":null,"originalURI":"http:\/\/127.0.0.1:39265\/Speedometer\/index.html?raptor","docshellUUID":"{efa6d2d5-6220-4799-8c26-ce748f499a9b}"}],"requestedIndex":0,"fromIdx":-1,"index":1,"userContextId":0}}
11-24 06:02:01.617 11172 11187 I Gecko : console.log: "[raptor-runnerjs] checking results..."
11-24 06:02:01.618 11172 11187 I GeckoConsole: [raptor-runnerjs] checking results...
11-24 06:02:01.869 11172 11187 I Gecko : console.log: "[raptor-runnerjs] checking results..."
11-24 06:02:01.869 11172 11187 I GeckoConsole: [raptor-runnerjs] checking results...
11-24 06:02:02.122 11172 11187 I Gecko : console.log: "[raptor-runnerjs] checking results..."
11-24 06:02:02.122 11172 11187 I GeckoConsole: [raptor-runnerjs] checking results...
11-24 06:02:02.373 11172 11187 I Gecko : console.log: "[raptor-runnerjs] checking results..."
11-24 06:02:02.374 11172 11187 I GeckoConsole: [raptor-runnerjs] checking results...
11-24 06:02:02.625 11172 11187 I Gecko : console.log: "[raptor-runnerjs] checking results..."
11-24 06:02:02.626 11172 11187 I GeckoConsole: [raptor-runnerjs] checking results...
11-24 06:02:02.879 11172 11187 I Gecko : console.log: "[raptor-runnerjs] checking results..."
11-24 06:02:02.880 11172 11187 I GeckoConsole: [raptor-runnerjs] checking results...
11-24 06:02:03.130 11172 11187 I Gecko : console.log: "[raptor-runnerjs] checking results..."
11-24 06:02:03.131 11172 11187 I GeckoConsole: [raptor-runnerjs] checking results...
11-24 06:02:03.382 11172 11187 I Gecko : console.log: "[raptor-runnerjs] checking results..."
11-24 06:02:03.382 11172 11187 I GeckoConsole: [raptor-runnerjs] checking results...
11-24 06:02:03.633 11172 11187 I Gecko : console.log: "[raptor-runnerjs] checking results..."
11-24 06:02:03.634 11172 11187 I GeckoConsole: [raptor-runnerjs] checking results...
11-24 06:02:03.884 11172 11187 I Gecko : console.log: "[raptor-runnerjs] checking results..."
11-24 06:02:03.885 11172 11187 I GeckoConsole: [raptor-runnerjs] checking results...
11-24 06:02:04.136 11172 11187 I Gecko : console.log: "[raptor-runnerjs] checking results..."
11-24 06:02:04.136 11172 11187 I GeckoConsole: [raptor-runnerjs] checking results...
11-24 06:02:04.161 11172 11187 D GeckoScreenOrientation: unlocking
11-24 06:02:04.161 11172 11187 D GeckoScreenOrientation: unlocking
11-24 06:02:04.387 11172 11187 I Gecko : console.log: "[raptor-runnerjs] checking results..."
11-24 06:02:04.388 11172 11187 I GeckoConsole: [raptor-runnerjs] checking results...
11-24 06:02:04.639 11172 11187 I Gecko : console.log: "[raptor-runnerjs] checking results..."
...
11-24 06:12:08.960 11172 11187 I GeckoConsole: [raptor-runnerjs] checking results...
11-24 06:12:09.212 11172 11187 I Gecko : console.log: "[raptor-runnerjs] checking results..."
11-24 06:12:09.212 11172 11187 I GeckoConsole: [raptor-runnerjs] checking results...
11-24 06:12:09.464 11172 11187 I Gecko : console.log: "[raptor-runnerjs] checking results..."
11-24 06:12:09.464 11172 11187 I GeckoConsole: [raptor-runnerjs] checking results...
Reporter | ||
Comment 1•4 years ago
|
||
Other similar examples:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=277981284&repo=autoland&lineNumber=1428
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=278601336&repo=autoland&lineNumber=1427
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=278635662&repo=autoland&lineNumber=1428
Reporter | ||
Comment 2•4 years ago
|
||
Reporter | ||
Comment 3•4 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #0)
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=277867767&repo=autoland&lineNumber=1428
https://firefoxci.taskcluster-artifacts.net/Cf3sCf8bQtKRT6u3pf1ylw/0/public/test_info/logcat-ZY322LH8PW.log
That logcat also has:
11-24 05:59:46.420 11172 11187 I Gecko : console.log: "[raptor-runnerjs] now is 1574575186414, set raptor alarm raptor-page-timeout to expire at 1574576086414"
I make that out as 900 seconds or 15 minutes. But the task times out before 15 minutes has expired...is that timeout too long?
Updated•4 years ago
|
Reporter | ||
Comment 4•4 years ago
|
||
With recent changes in bug 1595129, logs seem more consistent, but hangs continue. Recent failures:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=281778506&repo=autoland&lineNumber=1461
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=281776871&repo=autoland&lineNumber=2547
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=281747007&repo=autoland&lineNumber=1462
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=281736715&repo=autoland&lineNumber=1461
Most of these hangs appear to be in raptor-unity-webgl-geckoview-e10s.
Reporter | ||
Comment 5•4 years ago
|
||
Reporter | ||
Comment 6•4 years ago
|
||
Still happening...
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=290304089&repo=autoland&lineNumber=1461
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=290281405&repo=autoland&lineNumber=1461
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=290241035&repo=autoland&lineNumber=1461
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=290105318&repo=mozilla-central&lineNumber=1493
Comment 7•4 years ago
|
||
Note that we may miss to send some messages to the control server, and that we wait forever. I found this today and will fix it via bug 1617909. Lets hope that it is the underlying issue.
Comment 8•4 years ago
|
||
Geoff, can you please check if that is still happening now with bug 1617909 fixed some days ago? Thanks.
Reporter | ||
Comment 9•4 years ago
|
||
Yes, it is still happening today.
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=291240560&repo=mozilla-central&lineNumber=1462
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=291240583&repo=mozilla-central&lineNumber=1493
Comment 10•4 years ago
|
||
For speedometer tests I see:
03-02 11:41:22.630 8317 8317 I GeckoViewActivity: Visited URL: http://127.0.0.1:41089/Speedometer/resources/todomvc/vanilla-examples/es2015-babel-webpack/dist/index.html
03-02 11:41:22.643 8317 8332 E GeckoConsole: [JavaScript Error: "uncaught exception: undefined"]
03-02 11:41:22.643 8317 8332 E GeckoConsole: [JavaScript Error: "uncaught exception: undefined"]
03-02 11:41:22.643 8317 8332 E GeckoConsole: [JavaScript Error: "uncaught exception: undefined"]
Most likely those are related to loading the test page, and the test runner not correctly running and finishing the test. For unity-webgl we have a similar thing. See bug 1524545.
Reporter | ||
Comment 11•4 years ago
|
||
Recent failures:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=294327286&repo=mozilla-central&lineNumber=1513
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=294277451&repo=mozilla-central&lineNumber=1514
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=294241916&repo=mozilla-central&lineNumber=1483
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=294120387&repo=mozilla-central&lineNumber=1513
Comment 12•4 years ago
|
||
So Raptor sets a raptor-page-timeout
timer via the WebExtension API, which basically should hard-stop a long running test to not cause such a task aborted failure. In the logcat I can see that the timer has been set, but it never breaks out the wait for results loop. We will have to check why that happens.
Comment 13•4 years ago
|
||
Geoff, how often do we see those failures? Do you report any of those, or are those just some examples from many others? I would like to know how important it is to get this fixed. Thanks.
Reporter | ||
Comment 14•4 years ago
|
||
I see this type of failure reported in bug 1589796 (when the task times out). I see about 13 such failures there in the last week. I think that's significant/bad, but not terrible.
Reporter | ||
Comment 15•4 years ago
|
||
I think if the raptor-page-timeout was less, or the task max-run-time was larger, we would see page timeouts rather than task timeouts - that might be a step in the right direction.
Assignee | ||
Comment 16•4 years ago
|
||
I agree with :gbrown, our page timeout is 15mins right now for this test:
03-23 12:04:51.105 7478 7494 I GeckoConsole: [raptor-runnerjs] getting test settings from control server
03-23 12:04:51.119 7478 7494 I Gecko : console.info: "[raptor-runnerjs] test settings received: {\"raptor-options\": {\"expected_browser_cycles\": 1, \"subtest_unit\": \"score\", \"alert_threshold\": 2.0, \"type\": \"benchmark\", \"page_cycles\": 1, \"subtest_lower_is_better\": false, \"test_url\": \"http://<host>:<port>/unity-webgl/index.html?raptor\", \"page_timeout\": 900000, \"host\": \"127.0.0.1\", \"cold\": false, \"lower_is_better\": false, \"unit\": \"score\"}}"
03-23 12:04:51.120 7478 7494 I GeckoConsole: [raptor-runnerjs] test settings received: {"raptor-options": {"expected_browser_cycles": 1, "subtest_unit": "score", "alert_threshold": 2.0, "type": "benchmark", "page_cycles": 1, "subtest_lower_is_better": false, "test_url": "http://<host>:<port>/unity-webgl/index.html?raptor", "page_timeout": 900000, "host": "127.0.0.1", "cold": false, "lower_is_better": false, "unit": "score"}}
03-23 12:04:51.121 7478 7494 I Gecko : console.info: "[raptor-runnerjs] test URL: http://127.0.0.1:42405/unity-webgl/index.html?raptor"
03-23 12:04:51.121 7478 7494 I GeckoConsole: [raptor-runnerjs] test URL: http://127.0.0.1:42405/unity-webgl/index.html?raptor
03-23 12:04:51.121 7478 7494 I Gecko : console.info: "[raptor-runnerjs] using page timeout: 900000ms"
03-23 12:04:51.121 7478 7494 I GeckoConsole: [raptor-runnerjs] using page timeout: 900000ms
Assignee | ||
Comment 17•4 years ago
|
||
Looking at the logs for these tests, if we timeout at 7 minutes we should be fine. I'll make a patch for this.
Assignee | ||
Comment 18•4 years ago
|
||
This patch reduces the page load timeout for speedometer and unity-webgl tests to 7 minutes.
Updated•4 years ago
|
Updated•4 years ago
|
Comment 19•4 years ago
|
||
Pushed by hskupin@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/fff410e32a31 Reduce page load timeouts for some raptor benchmark tests. r=whimboo,perftest-reviewers,AlexandruIonescu
Comment 20•4 years ago
|
||
bugherder |
Description
•