Intermittent TEST-UNEXPECTED-TIMEOUT | gfx/layers/apz/test/mochitest/test_wheel_scroll.html | application timed out after 330 seconds with no output

RESOLVED FIXED in Firefox 57

Status

()

Core
Security: Process Sandboxing
RESOLVED FIXED
5 months ago
3 months ago

People

(Reporter: Treeherder Bug Filer, Assigned: kats)

Tracking

(Blocks: 1 bug, {intermittent-failure})

unspecified
mozilla57
intermittent-failure
Points:
---

Firefox Tracking Flags

(firefox57 fixed)

Details

(Whiteboard: [stockwell fixed:other]sb+)

MozReview Requests

()

Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:

Attachments

(1 attachment)

(Reporter)

Description

5 months ago
treeherder
Filed by: wkocher [at] mozilla.com

https://treeherder.mozilla.org/logviewer.html#?job_id=112245990&repo=autoland

https://queue.taskcluster.net/v1/task/D9LL18GmTNmsI3gtNMGFgA/runs/0/artifacts/public/logs/live_backing.log
Log contains this:

Sandbox: seccomp sandbox violation: pid 1131, tid 1200, syscall 241, args 0 128 3685376576 4293294352 3685376704 3685376712.  Killing process.

And then the dead process causes the test to time out.
Component: Panning and Zooming → Security: Process Sandboxing
On x86_64 syscall 241 is |__NR_mq_unlink|, on x86_32 it's |__NR_sched_setaffinity|.
Those numbers in comment #1 look very 32-bit (also, Treeherder shows it's a 32-bit job).

This seems to be used by Chaos Mode: https://searchfox.org/mozilla-central/rev/5e1e8d2f244bd8c210a578ff1f65c3b720efe34e/xpcom/threads/nsThread.cpp#467

What I don't understand is why this would be intermittent, or would have started recently — the code in question seems to have been present (and enabled for this test) for years.
(Assignee)

Updated

5 months ago
Duplicate of this bug: 1379009
Two bugs filed with this failure today after not seeing it ever before. Something must have changed recently to start triggering this.

Comment 6

4 months ago
20 failures in 656 pushes (0.03 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* autoland: 14
* mozilla-inbound: 5
* mozilla-central: 1

Platform breakdown:
* linux32: 20

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1378944&startday=2017-07-03&endday=2017-07-09&tree=all
(In reply to Jed Davis [:jld] (⏰UTC-6) from comment #3)
> This seems to be used by Chaos Mode:
> https://searchfox.org/mozilla-central/rev/
> 5e1e8d2f244bd8c210a578ff1f65c3b720efe34e/xpcom/threads/nsThread.cpp#467

I would propose whitelisting the syscall via our preferences, specifically for tests that rely on Chaos Mode?

Updated

4 months ago
Whiteboard: sblc3
We could also remove chaos mode from the tests. I don't think the tests *rely* on it, they just use it because it flushes out intermittent bugs faster, but I don't know if they really provide that much value.

Comment 9

4 months ago
17 failures in 720 pushes (0.024 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* mozilla-inbound: 8
* autoland: 7
* try: 1
* pine: 1

Platform breakdown:
* linux32: 17

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1378944&startday=2017-07-10&endday=2017-07-16&tree=all
This particular chaos mode feature, as it's used here, seems to apply only to threads started while the test is running.  That might help explain why it's intermittent and started happening recently.  But also, I'm not sure how much it's helping in that case.

If it were enabled at process startup it might be more useful, and then we could check for it when creating the sandbox policy so we wouldn't have to change anything for non-test usage.

Comment 11

4 months ago
27 failures in 822 pushes (0.033 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* autoland: 14
* mozilla-inbound: 7
* pine: 3
* try: 2
* mozilla-central: 1

Platform breakdown:
* linux32: 26
* osx-10-10: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1378944&startday=2017-07-17&endday=2017-07-23&tree=all

Comment 12

4 months ago
26 failures in 1008 pushes (0.026 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* autoland: 16
* mozilla-inbound: 8
* try: 2

Platform breakdown:
* linux32: 25
* linux64: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1378944&startday=2017-07-24&endday=2017-07-30&tree=all

Updated

4 months ago
Blocks: 1257239

Updated

4 months ago
Whiteboard: sblc3 → sb+

Comment 13

3 months ago
34 failures in 888 pushes (0.038 failures/push) were associated with this bug in the last 7 days.   

** This failure happened more than 30 times this week! Resolving this bug is a high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 2 weeks, the affected test(s) may be disabled. ** 

Repository breakdown:
* mozilla-inbound: 16
* autoland: 15
* try: 1
* oak: 1
* mozilla-central: 1

Platform breakdown:
* linux32: 34

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1378944&startday=2017-07-31&endday=2017-08-06&tree=all

Comment 14

3 months ago
25 failures in 901 pushes (0.028 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* autoland: 12
* mozilla-inbound: 11
* try: 2

Platform breakdown:
* linux32: 18
* linux64: 3
* linux64-stylo-sequential: 2
* linux64-stylo: 2

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1378944&startday=2017-08-07&endday=2017-08-13&tree=all
So what should we do here? Is somebody going to whitelist/work around this in the code, or should I just remove the chaos-mode annotation?
Flags: needinfo?(jld)
I'd say we should remove the chaos mode annotations.

They don't seem to really be doing anything useful at present; you'd need to enable the thread scheduling changes early in startup so they apply to all threads, at which point it'd be easy to test for it when building the sandbox policy.
Flags: needinfo?(jld)
(Assignee)

Updated

3 months ago
Assignee: nobody → bugmail
Comment hidden (mozreview-request)

Comment 18

3 months ago
mozreview-review
Comment on attachment 8897151 [details]
Bug 1378944 - Stop running some APZ mochitests with chaos mode since it causes sandbox failures and doesn't provide much value.

https://reviewboard.mozilla.org/r/168452/#review173688
Attachment #8897151 - Flags: review?(jld) → review+

Comment 19

3 months ago
Pushed by kgupta@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f374ccb241c7
Stop running some APZ mochitests with chaos mode since it causes sandbox failures and doesn't provide much value. r=jld

Comment 20

3 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/f374ccb241c7
Status: NEW → RESOLVED
Last Resolved: 3 months ago
status-firefox57: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla57

Comment 21

3 months ago
8 failures in 949 pushes (0.008 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* autoland: 6
* try: 1
* mozilla-inbound: 1

Platform breakdown:
* linux32: 6
* linux64-stylo-sequential: 1
* linux64-stylo: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1378944&startday=2017-08-14&endday=2017-08-20&tree=all
Whiteboard: sb+ → [stockwell fixed:other]sb+
You need to log in before you can comment on or make changes to this bug.