Closed Bug 1453863 Opened 6 years ago Closed 6 years ago

Intermittent TEST-UNEXPECTED-TIMEOUT | dom/events/test/test_wheel_default_action.html | application timed out after 370 seconds with no output

Categories

(Core :: Panning and Zooming, defect, P2)

61 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla61
Tracking Status
firefox59 --- unaffected
firefox60 --- unaffected
firefox61 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: kats)

References

Details

(Keywords: intermittent-failure)

Crash Data

Attachments

(1 file, 1 obsolete file)

Filed by: rgurzau [at] mozilla.com

https://treeherder.mozilla.org/logviewer.html#?job_id=173410896&repo=autoland

https://queue.taskcluster.net/v1/task/M83CdU08RqmjEdGCeELEfg/runs/0/artifacts/public/logs/live_backing.log

[task 2018-04-13T02:07:43.265Z] 02:07:43  WARNING - runtests.py | Failed to get app exit code - running/crashed?
[task 2018-04-13T02:07:43.267Z] 02:07:43     INFO - TEST-INFO | Main app process: exit 0
[task 2018-04-13T02:07:43.268Z] 02:07:43     INFO - runtests.py | Application ran for: 0:11:21.305372
[task 2018-04-13T02:07:43.272Z] 02:07:43     INFO - zombiecheck | Reading PID log: /tmp/tmppZdDf0pidlog
[task 2018-04-13T02:07:43.272Z] 02:07:43     INFO - ==> process 1665 launched child process 1685
[task 2018-04-13T02:07:43.272Z] 02:07:43     INFO - ==> process 1665 launched child process 1710
[task 2018-04-13T02:07:43.272Z] 02:07:43     INFO - ==> process 1665 launched child process 1738
[task 2018-04-13T02:07:43.272Z] 02:07:43     INFO - zombiecheck | Checking for orphan process with PID: 1738
[task 2018-04-13T02:07:43.272Z] 02:07:43     INFO - zombiecheck | Checking for orphan process with PID: 1685
[task 2018-04-13T02:07:43.273Z] 02:07:43     INFO - zombiecheck | Checking for orphan process with PID: 1710
[task 2018-04-13T02:07:43.273Z] 02:07:43     INFO - mozcrash Copy/paste: /usr/local/bin/linux64-minidump_stackwalk /tmp/tmp7h19jw.mozrunner/minidumps/61911ce0-0e4d-8c4e-c5be-417b87575a34.dmp /builds/worker/workspace/build/symbols
[task 2018-04-13T02:07:43.288Z] 02:07:43     INFO - mozcrash Saved minidump as /builds/worker/workspace/build/blobber_upload_dir/61911ce0-0e4d-8c4e-c5be-417b87575a34.dmp
[task 2018-04-13T02:07:43.290Z] 02:07:43     INFO - PROCESS-CRASH | Main app process exited normally | application crashed [None]
[task 2018-04-13T02:07:43.291Z] 02:07:43     INFO - Crash dump filename: /tmp/tmp7h19jw.mozrunner/minidumps/61911ce0-0e4d-8c4e-c5be-417b87575a34.dmp
[task 2018-04-13T02:07:43.293Z] 02:07:43     INFO - stderr from minidump_stackwalk:
[task 2018-04-13T02:07:43.294Z] 02:07:43     INFO - 2018-04-13 02:07:43: minidump.cc:4359: INFO: Minidump opened minidump /tmp/tmp7h19jw.mozrunner/minidumps/61911ce0-0e4d-8c4e-c5be-417b87575a34.dmp
[task 2018-04-13T02:07:43.295Z] 02:07:43     INFO - 2018-04-13 02:07:43: minidump.cc:4479: INFO: Minidump not byte-swapping minidump
[task 2018-04-13T02:07:43.297Z] 02:07:43     INFO - 2018-04-13 02:07:43: minidump.cc:4956: INFO: GetStream: type 15 not present
[task 2018-04-13T02:07:43.298Z] 02:07:43     INFO - 2018-04-13 02:07:43: minidump.cc:4956: INFO: GetStream: type 7 not present
[task 2018-04-13T02:07:43.299Z] 02:07:43     INFO - 2018-04-13 02:07:43: minidump.cc:4956: INFO: GetStream: type 7 not present
[task 2018-04-13T02:07:43.300Z] 02:07:43     INFO - 2018-04-13 02:07:43: minidump.cc:4956: INFO: GetStream: type 1197932545 not present
[task 2018-04-13T02:07:43.302Z] 02:07:43     INFO - 2018-04-13 02:07:43: minidump.cc:4956: INFO: GetStream: type 6 not present
[task 2018-04-13T02:07:43.302Z] 02:07:43     INFO - 2018-04-13 02:07:43: minidump.cc:4956: INFO: GetStream: type 1197932546 not present
[task 2018-04-13T02:07:43.303Z] 02:07:43     INFO - 2018-04-13 02:07:43: minidump.cc:4956: INFO: GetStream: type 4 not present
[task 2018-04-13T02:07:43.304Z] 02:07:43     INFO - 2018-04-13 02:07:43: minidump.cc:4956: INFO: GetStream: type 5 not present
[task 2018-04-13T02:07:43.305Z] 02:07:43     INFO - 2018-04-13 02:07:43: minidump.cc:4956: INFO: GetStream: type 3 not present
[task 2018-04-13T02:07:43.306Z] 02:07:43     INFO - 2018-04-13 02:07:43: minidump_processor.cc:152: ERROR: Minidump /tmp/tmp7h19jw.mozrunner/minidumps/61911ce0-0e4d-8c4e-c5be-417b87575a34.dmp has no thread list
[task 2018-04-13T02:07:43.307Z] 02:07:43     INFO - 2018-04-13 02:07:43: stackwalk.cc:139: ERROR: MinidumpProcessor::Process failed
[task 2018-04-13T02:07:43.308Z] 02:07:43     INFO - 2018-04-13 02:07:43: minidump.cc:4331: INFO: Minidump closing minidump
[task 2018-04-13T02:07:43.310Z] 02:07:43     INFO - mozcrash Copy/paste: /usr/local/bin/linux64-minidump_stackwalk /tmp/tmp7h19jw.mozrunner/minidumps/44d12e3e-6ac6-543a-6e32-af01d09c6432.dmp /builds/worker/workspace/build/symbols
[task 2018-04-13T02:07:49.073Z] 02:07:49     INFO - mozcrash Saved minidump as /builds/worker/workspace/build/blobber_upload_dir/44d12e3e-6ac6-543a-6e32-af01d09c6432.dmp
[task 2018-04-13T02:07:49.074Z] 02:07:49     INFO - mozcrash Saved app info as /builds/worker/workspace/build/blobber_upload_dir/44d12e3e-6ac6-543a-6e32-af01d09c6432.extra
[task 2018-04-13T02:07:49.223Z] 02:07:49     INFO - PROCESS-CRASH | Main app process exited normally | application crashed [@ libpthread-2.23.so + 0x1026d]
[task 2018-04-13T02:07:42.553Z] 02:07:42     INFO - TEST-PASS | dom/events/test/test_wheel_default_action.html | doTestAutoDirScroll(aSettings=deltaX is reverted, {honoursRoot: false}), root = writingMode: horizontal-tb; direction: rtl; target = writingMode: vertical-rl; direction: ltr; adjusted auto-dir scroll to right(originally top) by pixel scroll even if lineOrPageDelta is 0, no vertical scrollbar: scrolled vertical 
[task 2018-04-13T02:07:42.554Z] 02:07:42     INFO - TEST-PASS | dom/events/test/test_wheel_default_action.html | doTestAutoDirScroll(aSettings=deltaX is reverted, {honoursRoot: false}), root = writingMode: horizontal-tb; direction: rtl; target = writingMode: vertical-rl; direction: ltr; adjusted auto-dir scroll to right(originally top) by pixel scroll even if lineOrPageDelta is 0, no vertical scrollbar: not scrolled to right, got -992 
[task 2018-04-13T02:07:42.555Z] 02:07:42     INFO - Buffered messages finished
[task 2018-04-13T02:07:42.558Z] 02:07:42    ERROR - TEST-UNEXPECTED-TIMEOUT | dom/events/test/test_wheel_default_action.html | application timed out after 370 seconds with no output
Severity: critical → normal
Keywords: crash
Summary: Intermittent PROCESS-CRASH | Main app process exited normally | application crashed [@ libpthread-2.23.so + 0x1026d] → Intermittent TEST-UNEXPECTED-TIMEOUT | dom/events/test/test_wheel_default_action.html | application timed out after 370 seconds with no output
Based on comment 1 this looks like a regression from bug 1358017.
Blocks: autodir
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #3)
> Based on comment 1 this looks like a regression from bug 1358017.

The test time has been increased for autodir. So 370 seconds is not enough. It may need nearly 10 minutes. So it's not a regression. The allowed test time for the file has also been increased, see https://searchfox.org/mozilla-central/source/dom/events/test/window_wheel_default_action.html#71
Since there is SimpleTest.requestLongerTimeout(6); in the file.

So I am wondering why 370 seconds tiggered the timed-out failure. Is that some external code which turns off the feature of requestLongerTimeout?
I think the problem might be that there was *no output* for 370 seconds. If you look at the log in comment 0, the last test output is preceded by a line that says "Buffered messages logged at 02:01:28" and the timeout happens at 02:07:42 which is 374 seconds later. So either the test is hanging, or it's running through a patch of 6+ minutes of testing where there's no test output at all. Either way it happens after the test that outputs "doTestAutoDirScroll(aSettings=deltaX is reverted, {honoursRoot: false}), root = writingMode: horizontal-tb; direction: rtl; target = writingMode: vertical-rl; direction: ltr; adjusted auto-dir scroll to right(originally top) by pixel scroll even if lineOrPageDelta is 0, no vertical scrollbar: not scrolled to right, got -992"
And actually if you look further up in the log you see lots of this:

[task 2018-04-13T02:00:55.612Z] 02:00:55     INFO - GECKO(1665) | ###!!! ERROR: Potential deadlock detected:
[task 2018-04-13T02:00:55.613Z] 02:00:55     INFO - GECKO(1665) | === Cyclical dependency starts at
[task 2018-04-13T02:00:55.613Z] 02:00:55     INFO - GECKO(1665) | --- RecursiveMutex : APZCTreeLock calling context
[task 2018-04-13T02:00:55.614Z] 02:00:55     INFO - GECKO(1665) |   [stack trace unavailable]
[task 2018-04-13T02:00:55.615Z] 02:00:55     INFO - GECKO(1665) | --- Next dependency:
[task 2018-04-13T02:00:55.616Z] 02:00:55     INFO - GECKO(1665) | --- RecursiveMutex : AsyncPanZoomController (currently acquired)
[task 2018-04-13T02:00:55.617Z] 02:00:55     INFO - GECKO(1665) |  calling context
[task 2018-04-13T02:00:55.618Z] 02:00:55     INFO - GECKO(1665) |   [stack trace unavailable]
[task 2018-04-13T02:00:55.619Z] 02:00:55     INFO - GECKO(1665) | === Cycle completed at
[task 2018-04-13T02:00:55.620Z] 02:00:55     INFO - GECKO(1665) | --- RecursiveMutex : APZCTreeLock calling context
[task 2018-04-13T02:00:55.620Z] 02:00:55     INFO - GECKO(1665) |   [stack trace unavailable]
[task 2018-04-13T02:00:55.621Z] 02:00:55     INFO - GECKO(1665) | Deadlock may happen for some other execution
[task 2018-04-13T02:00:55.622Z] 02:00:55     INFO - GECKO(1665) | [Parent 1665, Main Thread] WARNING: Potential deadlock detected:
[task 2018-04-13T02:00:55.622Z] 02:00:55     INFO - GECKO(1665) | Cyclical dependency starts at
[task 2018-04-13T02:00:55.623Z] 02:00:55     INFO - GECKO(1665) | RecursiveMutex : APZCTreeLock
[task 2018-04-13T02:00:55.624Z] 02:00:55     INFO - GECKO(1665) | Next dependency:
[task 2018-04-13T02:00:55.625Z] 02:00:55     INFO - GECKO(1665) | RecursiveMutex : AsyncPanZoomController (currently acquired)
[task 2018-04-13T02:00:55.626Z] 02:00:55     INFO - GECKO(1665) | Cycle completed at
[task 2018-04-13T02:00:55.626Z] 02:00:55     INFO - GECKO(1665) | RecursiveMutex : APZCTreeLock
[task 2018-04-13T02:00:55.627Z] 02:00:55     INFO - GECKO(1665) | Deadlock may happen for some other execution
[task 2018-04-13T02:00:55.628Z] 02:00:55     INFO - GECKO(1665) | : file /builds/worker/workspace/build/src/xpcom/threads/BlockingResourceBase.cpp, line 309

so it's quite likely that the test is hitting a deadlock because it's acquiring locks in the wrong order.
https://hg.mozilla.org/integration/autoland/rev/509187cd4c1a#l4.220 for example is a call that acquires the APZ tree lock while holding the AsyncPanZoomController's mRecursiveMutex, and so violates the lock ordering requirements.
Here's a try push that might fix it: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d270e34520b9a2642af0af4e88e74e97b82de745
Component: DOM: Events → Panning and Zooming
OS: Unspecified → All
Priority: P5 → P2
Hardware: Unspecified → All
Version: unspecified → 61 Branch
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #8)
> https://hg.mozilla.org/integration/autoland/rev/509187cd4c1a#l4.220 for
> example is a call that acquires the APZ tree lock while holding the
> AsyncPanZoomController's mRecursiveMutex, and so violates the lock ordering
> requirements.

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #9)
> Here's a try push that might fix it:
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=d270e34520b9a2642af0af4e88e74e97b82de745

Wow. Thank you very much for the spot of the potential deadlock.
Try push seems to be good, I don't see any deadlock warnings in the log.
Assignee: nobody → bugmail
Thank you for your trust.

But I am sorry, Kartikaya. I am in L1, so I don't have access to review.

You may need to find someone else to review it.
Ah, MozReview might not give you permission but technically you're still allowed to review patches if the patch was written by the relevant module owner or peer, which I am. I think you're an appropriate reviewer since you added the code in question and understand it.

I'm attaching the patch as an attachment instead and obsoleting the MozReview version, so you should be able to review it by changing the review flag to r+. I can then land it manually on inbound.
Attachment #8968902 - Attachment is obsolete: true
Attachment #8968902 - Flags: review?(zjz)
Attachment #8968910 - Flags: review?(zjz)
Comment on attachment 8968910 [details] [diff] [review]
Patch - remove deadlock scenario

Okay, done
Attachment #8968910 - Flags: review?(zjz) → review+
Thanks! I'll land it on inbound once the trees are open again.
Pushed by kgupta@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/a12624f2ce5b
Avoid potential deadlock scenario caused by improper lock acquisiting order. r=zjz
https://hg.mozilla.org/mozilla-central/rev/a12624f2ce5b
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Please don't reopen bugs where patches have landed. File a new bug if needed.
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
Recent failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=197969318&repo=mozilla-central&lineNumber=15068
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
See Also: → 1489395
I filed a new bug (bug 1489395) for this, because this bug patch has landed.

So, closing this bug.
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.