Closed Bug 1422219 Opened 7 years ago Closed 11 months ago

Intermittent ThreadUtils.IdleRunnableMethod | Value of: idleNoSetDeadline->mRunnableExecuted

Categories

(Core :: XPCOM, defect, P5)

defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: intermittent-bug-filer, Unassigned)

References

Details

(Keywords: intermittent-failure, Whiteboard: [retriggered][stockwell disabled][stockwell unknown])

Attachments

(1 file)

https://wiki.mozilla.org/Bug_Triage#Intermittent_Test_Failure_Cleanup
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → INCOMPLETE
Latest failure: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=testfailed,busted,exception&classifiedState=unclassified&fromchange=d9eb5d5160d0bd758138a9e1da3cc12d8edb96fb&selectedJob=199259949

https://treeherder.mozilla.org/logviewer.html#?job_id=199259949&repo=autoland&lineNumber=316534

07:48:05  WARNING -  TEST-UNEXPECTED-FAIL | GTest unit test: failed
07:48:05     INFO -  Passed: 5399
07:48:05  WARNING -  Failed: 1
07:48:05  WARNING -  One or more unittests failed.
07:48:05     INFO -  Finished running GTest tests.
07:48:05    ERROR -  gtest TEST-UNEXPECTED-FAIL | gtest | test failed with return code 1
07:48:05    ERROR - Return code: 1
07:48:05     INFO - TinderboxPrint: gtest-gtest<br/>5399/<em class="testfail">1</em>
07:48:05    ERROR - # TBPL FAILURE #
07:48:05  WARNING - setting return code to 2
07:48:05    ERROR - The gtest suite: gtest ran with return status: FAILURE
07:48:05     INFO - Running post-action listener: _package_coverage_data
07:48:05     INFO - Running post-action listener: _resource_record_post_action
07:48:05     INFO - Running post-action listener: process_java_coverage_data
07:48:05     INFO - [mozharness: 2018-09-14 07:48:05.070000Z] Finished run-tests step (success)
07:48:05     INFO - Running post-run listener: _resource_record_post_run
07:48:05     INFO - Validating Perfherder data against Z:\task_1536909850\mozharness\external_tools\performance-artifact-schema.json
07:48:05     INFO - PERFHERDER_DATA: {"framework": {"name": "job_resource_usage"}, "suites": [{"subtests": [{"name": "cpu_percent", "value": 22.033513189448453}, {"name": "io_write_bytes", "value": 808357376}, {"name": "io.read_bytes", "value": 1480009216}, {"name": "io_write_time", "value": 16}, {"name": "io_read_time", "value": 21}], "extraOptions": ["taskcluster-c4.2xlarge"], "name": "gtest.gtest.gtest.overall"}, {"subtests": [{"name": "time", "value": 3.131999969482422}, {"name": "cpu_percent", "value": 9.93125}], "name": "gtest.gtest.gtest.install"}, {"subtests": [{"name": "time", "value": 0.4909999370574951}], "name": "gtest.gtest.gtest.stage-files"}, {"subtests": [{"name": "time", "value": 835.7730000019073}, {"name": "cpu_percent", "value": 22.101129518072298}], "name": "gtest.gtest.gtest.run-tests"}]}
07:48:05     INFO - Total resource usage - Wall time: 839s; CPU: 22.0%; Read bytes: 1480009216; Write bytes: 808357376; Read time: 21; Write time: 16
07:48:05     INFO - TinderboxPrint: CPU usage<br/>22.1%
07:48:05     INFO - TinderboxPrint: I/O read bytes / time<br/>1,480,009,216 / 21
07:48:05     INFO - TinderboxPrint: I/O write bytes / time<br/>808,357,376 / 16
07:48:05     INFO - TinderboxPrint: CPU idle<br/>5,233.4 (77.9%)
07:48:05     INFO - TinderboxPrint: CPU system<br/>844.9 (12.6%)
07:48:05     INFO - TinderboxPrint: CPU user<br/>629.1 (9.4%)
07:48:05     INFO - install - Wall time: 3s; CPU: 10.0%; Read bytes: 0; Write bytes: 83826688; Read time: 0; Write time: 0
07:48:05     INFO - stage-files - Wall time: 0s; CPU: Can't collect data; Read bytes: 0; Write bytes: 0; Read time: 0; Write time: 0
07:48:05     INFO - run-tests - Wall time: 836s; CPU: 22.0%; Read bytes: 1480009216; Write bytes: 551892992; Read time: 21; Write time: 15
07:48:05     INFO - Running post-run listener: copy_logs_to_upload_dir
07:48:05     INFO - Copying logs to upload dir...
07:48:05     INFO - mkdir: Z:\task_1536909850\build\upload\logs
07:48:05     INFO - Copying logs to upload dir...
07:48:05     INFO - Using _rmtree_windows ...
07:48:05     INFO - Running command: del /F /Q "Z:\task_1536909850\build\upload\logs\localconfig.json"
07:48:05     INFO - Return code: 0
07:48:05     INFO - Using _rmtree_windows ...
07:48:05     INFO - Running command: del /F /Q "Z:\task_1536909850\build\upload\logs\log_info.log"
07:48:05     INFO - Return code: 0
07:48:05     INFO - Using _rmtree_windows ...
07:48:05     INFO - Running command: del /F /Q "Z:\task_1536909850\build\upload\logs\log_raw.log"
07:48:05     INFO - Return code: 0
07:48:05     INFO - Using _rmtree_windows ...
07:48:05     INFO - Running command: del /F /Q "Z:\task_1536909850\build\upload\logs\log_warning.log"
07:48:05     INFO - Return code: 0
07:48:05     INFO - Using _rmtree_windows ...
07:48:05     INFO - Running command: del /F /Q "Z:\task_1536909850\build\upload\logs\log_critical.log"
07:48:05     INFO - Return code: 0
07:48:05     INFO - Using _rmtree_windows ...
07:48:05     INFO - Running command: del /F /Q "Z:\task_1536909850\build\upload\logs\log_error.log"
07:48:05     INFO - Return code: 0
07:48:05     INFO - Using _rmtree_windows ...
07:48:05     INFO - Running command: del /F /Q "Z:\task_1536909850\build\upload\logs\log_fatal.log"
07:48:05     INFO - Return code: 0
07:48:05  WARNING - returning nonzero exit status 2
[taskcluster 2018-09-14T07:48:06.003Z]    Exit Code: 2
[taskcluster 2018-09-14T07:48:06.003Z]    User Time: 15.6001ms
[taskcluster 2018-09-14T07:48:06.003Z]  Kernel Time: 0s
[taskcluster 2018-09-14T07:48:06.003Z]    Wall Time: 20m35.500314s
[taskcluster 2018-09-14T07:48:06.003Z]       Result: FAILED
[taskcluster 2018-09-14T07:48:06.003Z] === Task Finished ===
[taskcluster 2018-09-14T07:48:06.003Z] Task Duration: 20m35.504316s
[taskcluster 2018-09-14T07:48:06.681Z] Uploading artifact public/logs/localconfig.json from file logs\localconfig.json with content encoding "gzip", mime type "application/octet-stream" and expiry 2019-09-14T06:18:51.224Z
[taskcluster 2018-09-14T07:48:07.708Z] Uploading artifact public/logs/log_critical.log from file logs\log_critical.log with content encoding "gzip", mime type "text/plain" and expiry 2019-09-14T06:18:51.224Z
[taskcluster 2018-09-14T07:48:08.218Z] Uploading artifact public/logs/log_error.log from file logs\log_error.log with content encoding "gzip", mime type "text/plain" and expiry 2019-09-14T06:18:51.224Z
[taskcluster 2018-09-14T07:48:08.756Z] Uploading artifact public/logs/log_fatal.log from file logs\log_fatal.log with content encoding "gzip", mime type "text/plain" and expiry 2019-09-14T06:18:51.224Z
[taskcluster 2018-09-14T07:48:09.257Z] Uploading artifact public/logs/log_info.log from file logs\log_info.log with content encoding "gzip", mime type "text/plain" and expiry 2019-09-14T06:18:51.224Z
[taskcluster 2018-09-14T07:48:10.974Z] Uploading artifact public/logs/log_raw.log from file logs\log_raw.log with content encoding "gzip", mime type "text/plain" and expiry 2019-09-14T06:18:51.224Z
[taskcluster 2018-09-14T07:48:11.684Z] Uploading artifact public/logs/log_warning.log from file logs\log_warning.log with content encoding "gzip", mime type "text/plain" and expiry 2019-09-14T06:18:51.224Z
[taskcluster 2018-09-14T07:48:12.226Z] Uploading artifact public/test_info/resource-usage.json from file build\blobber_upload_dir\resource-usage.json with content encoding "gzip", mime type "application/octet-stream" and expiry 2019-09-14T06:18:51.224Z
[taskcluster 2018-09-14T07:48:12.835Z] Uploading artifact public/test_info/system-info.log from file build\blobber_upload_dir\system-info.log with content encoding "gzip", mime type "text/plain" and expiry 2019-09-14T06:18:51.224Z
[taskcluster 2018-09-14T07:48:13.616Z] Uploading redirect artifact public/logs/live.log to URL https://queue.taskcluster.net/v1/task/H4bjvCk4TiK9gNUEvo7niQ/runs/0/artifacts/public/logs/live_backing.log with mime type "text/plain; charset=utf-8" and expiry 2019-09-14T06:18:51.224Z
[taskcluster:error] exit status 2
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
https://wiki.mozilla.org/Bug_Triage#Intermittent_Test_Failure_Cleanup
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Status: REOPENED → RESOLVED
Closed: 6 years ago5 years ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Whiteboard: [retriggered]

I have a patch at https://phabricator.services.mozilla.com/D28085 which should fix this issue for windows10-aarch64.

Flags: needinfo?(egao)
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---

There have been 50 total failures, in the last 7 days (recent failure log).
Affected platforms are:

  • linux1804-32-shippable-qr
  • linux1804-64-asan-qr
  • linux1804-64-qr opt & debug
  • macosx1015-64-qr debug

Nika, could you please redirect this to someone who can take a look?

Flags: needinfo?(nika)

It looks like the failure is occuring because we dispatch an idle runnable to the current thread, and then wait for all events on the thread to be processed with NS_ProcessPendingEvents, but never actually run it. It seems surprising to me that we'd return from NS_ProcessPendingEvents without processing idle runnables.

ni? :bas who did a lot of the work on the TaskController stuff and would know if this is somehow expected behaviour.

Flags: needinfo?(nika) → needinfo?(bas)

Olli did most of the work for the idle handling. I don't think it's unexpected for us to return if we -don't- consider ourselves idle yet (this can be desired if other processes are not idle for example). But I believe logic should wake up the event loop again once we are idle.

Flags: needinfo?(bas) → needinfo?(bugs)
Whiteboard: [retriggered][stockwell disable-recommended] → [retriggered][stockwell needswork:owner]
Assignee: nobody → apavel
Assignee: apavel → nobody
Keywords: leave-open
Whiteboard: [retriggered][stockwell needswork:owner] → [retriggered][stockwell needswork:owner] [stockwell disabled]
Pushed by ctuns@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/c3bbb764d3b4
disable ThreadUtils.IdleRunnableMethod due to frequent failures r=intermittent-reviewers,jmaher
Severity: normal → --

I've tried to reproduce this locally, but no luck yet.

Given that the most recent comment is over a year old, I'm marking this as WFM.

Status: REOPENED → RESOLVED
Closed: 5 years ago11 months ago
Flags: needinfo?(smaug)
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: