Intermittent raptor-browsertime Critical: Browsertime process timed out after waiting X seconds for output
Categories
(Testing :: Raptor, defect, P5)
Tracking
(Not tracked)
People
(Reporter: intermittent-bug-filer, Unassigned)
References
Details
(Keywords: intermittent-failure, Whiteboard: [retriggered])
Attachments
(1 file)
143.58 KB,
image/png
|
Details |
Filed by: mlaza [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=381820969&repo=mozilla-central
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/H0xJNvotS0-oL_QaAJhCbg/runs/0/artifacts/public/logs/live_backing.log
[task 2022-06-19T03:47:02.968Z] 03:47:02 INFO - raptor-browsertime Info: timeout (s): 72000
[task 2022-06-19T03:47:02.968Z] 03:47:02 INFO - raptor-browsertime Info: browsertime cwd: /home/cltbld/tasks/task_165560485162194/build
[task 2022-06-19T03:47:02.969Z] 03:47:02 INFO - raptor-browsertime Info: browsertime cmd: /home/cltbld/tasks/task_165560485162194/fetches/node/bin/node /home/cltbld/tasks/task_165560485162194/fetches/browsertime/node_modules/browsertime/bin/browsertime.js --firefox.geckodriverPath /home/cltbld/tasks/task_165560485162194/fetches/geckodriver /home/cltbld/tasks/task_165560485162194/build/tests/raptor/raptor/browsertime/../../browsertime/browsertime_pageload.js --firefox.noDefaultPrefs --browsertime.page_cycle_delay 5000 --skipHar --pageLoadStrategy none --webdriverPageload true --firefox.disableBrowsertimeExtension true --pageCompleteCheckStartWait 5000 --pageCompleteCheckPollTimeout 1000 --timeouts.pageLoad 72000 --timeouts.script 144000 --browsertime.page_cycles 2 --pageCompleteWaitTime 5000 --browsertime.url https://www.cnn.com/2021/03/22/weather/climate-change-warm-waters-lake-michigan/index.html --browsertime.post_startup_delay 1000 --iterations 5 --videoParams.androidVideoWaitTime 10000 --browsertime.chimera true --browsertime.secondary_url https://www.cnn.com/weather --browsertime.commands --viewPort 1280x1024 --browser firefox --firefox.binaryPath /home/cltbld/tasks/task_165560485162194/build/application/firefox/firefox --firefox.profileTemplate /tmp/tmpf4vktcur/profile --resultDir /home/cltbld/tasks/task_165560485162194/build/blobber_upload_dir/browsertime-results/cnn --video true --visualMetrics true --visualMetricsContentful true --visualMetricsPerceptual true --visualMetricsPortable true --videoParams.keepOriginalVideo true --firefox.windowRecorder true --browsertime.testName cnn --browsertime.liveSite True --browsertime.loginRequired False
[task 2022-06-19T03:47:02.969Z] 03:47:02 INFO - raptor-browsertime Info: browsertime_ffmpeg: /home/cltbld/tasks/task_165560485162194/fetches/ffmpeg-4.1.4-i686-static/ffmpeg
[task 2022-06-19T03:47:02.969Z] 03:47:02 INFO - raptor-browsertime Info: PATH: b'/home/cltbld/tasks/task_165560485162194/fetches/ffmpeg-4.1.4-i686-static:/home/cltbld/tasks/task_165560485162194/build/venv/bin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin'
[task 2022-06-19T03:47:03.343Z] 03:47:03 INFO - raptor-browsertime Info: Running tests using Firefox - 5 iteration(s)
[task 2022-06-19T03:47:03.345Z] 03:47:03 INFO - raptor-browsertime Info: Skip setting default preferences for Firefox
[task 2022-06-19T03:47:05.847Z] 03:47:05 INFO - raptor-browsertime Info: Starting a browsertime pageload
[task 2022-06-19T03:47:05.847Z] 03:47:05 INFO - raptor-browsertime Info: Waiting for 1000 ms (post_startup_delay)
[task 2022-06-19T03:47:06.850Z] 03:47:06 INFO - raptor-browsertime Info: Navigating to about:blank, count: 0
[task 2022-06-19T03:47:06.850Z] 03:47:06 INFO - raptor-browsertime Info: Navigating to url about:blank iteration 1
[task 2022-06-19T03:47:11.896Z] 03:47:11 INFO - raptor-browsertime Info: Navigating to primary url:https://www.cnn.com/2021/03/22/weather/climate-change-warm-waters-lake-michigan/index.html
[task 2022-06-19T03:47:11.897Z] 03:47:11 INFO - raptor-browsertime Info: Cycle 0, waiting for 5000 ms
[task 2022-06-19T03:47:16.901Z] 03:47:16 INFO - raptor-browsertime Info: Cycle 0, starting the measure
[task 2022-06-19T03:47:16.902Z] 03:47:16 INFO - raptor-browsertime Info: Testing url https://www.cnn.com/2021/03/22/weather/climate-change-warm-waters-lake-michigan/index.html iteration 1
[task 2022-06-19T03:47:16.904Z] 03:47:16 INFO - raptor-browsertime Info: Start firefox window recorder.
[task 2022-06-19T03:51:17.601Z] 03:51:17 CRITICAL - raptor-browsertime Critical: Browsertime process timed out after waiting 240 seconds for output
[task 2022-06-19T03:51:17.602Z] 03:51:17 INFO - raptor-perftest Info: Removing temporary directory: /tmp/tmpf4vktcur
[task 2022-06-19T03:51:17.628Z] 03:51:17 ERROR - Traceback (most recent call last):
[task 2022-06-19T03:51:17.628Z] 03:51:17 INFO - File "/home/cltbld/tasks/task_165560485162194/build/tests/raptor/raptor/raptor.py", line 203, in <module>
[task 2022-06-19T03:51:17.628Z] 03:51:17 INFO - main()
[task 2022-06-19T03:51:17.629Z] 03:51:17 INFO - File "/home/cltbld/tasks/task_165560485162194/build/tests/raptor/raptor/raptor.py", line 149, in main
[task 2022-06-19T03:51:17.629Z] 03:51:17 INFO - success = raptor.run_tests(raptor_test_list, raptor_test_names)
[task 2022-06-19T03:51:17.629Z] 03:51:17 INFO - File "/home/cltbld/tasks/task_165560485162194/build/tests/raptor/raptor/perftest.py", line 460, in run_tests
[task 2022-06-19T03:51:17.629Z] 03:51:17 INFO - self.run_test(test, timeout=int(test.get("page_timeout")))
[task 2022-06-19T03:51:17.629Z] 03:51:17 INFO - File "/home/cltbld/tasks/task_165560485162194/build/tests/raptor/raptor/browsertime/base.py", line 698, in run_test
[task 2022-06-19T03:51:17.629Z] 03:51:17 INFO - f"Browsertime process timed out after waiting {output_timeout} seconds "
[task 2022-06-19T03:51:17.630Z] 03:51:17 INFO - Exception: Browsertime process timed out after waiting 240 seconds for output
[task 2022-06-19T03:51:17.690Z] 03:51:17 ERROR - Return code: 1
Comment hidden (Intermittent Failures Robot) |
Updated•3 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 31•3 years ago
|
||
Instagram job is failing frequently now, even where the job was green on the first run. Backfill range and retriggers.
The failures here seem similar with what happened here https://bugzilla.mozilla.org/show_bug.cgi?id=1784176#c29
Should we file a separate bug for this Instagram failures?
Hi Greg! Can you please take a look at this?
Thank you!
Comment 32•3 years ago
|
||
Hi Iulian, thanks the for needinfo! I'm looking into it.
It looks like we're timing out while we try to convert the video to 60FPS, and given that we've got failures on previously passing pushes makes me wonder if there's an issue at Bitbar. :aerickson, could we try to reset the Android Bitbar hosts to see if that helps with this failure?
Comment 33•3 years ago
|
||
Yeah, I will ask Bitbar to restart the docker hosts (and11, and15, and18) for those devices.
Comment 34•3 years ago
|
||
Bitbar is draining and restarting these hosts now. Should be done in the next few hours.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 37•3 years ago
|
||
There have been 30 total failures in the last 7 days, recent failure log.
Affected platforms are:
- android-hw-a51-11-0-aarch64-shippable-qr
- linux1804-64-shippable-qr
- macosx1015-64-shippable-qr
Comment hidden (Intermittent Failures Robot) |
Comment 39•3 years ago
|
||
Noticed the secondary_url for imgur 404s. I will see if rerecording imgur with a different secondary_url may help
Comment hidden (Intermittent Failures Robot) |
Comment 41•3 years ago
|
||
Getting frequent with a rate of 7/10 failures/retriggers: push
[task 2023-01-31T09:32:50.908Z] 09:30:34 INFO - raptor-browsertime Info: Use the visual metrics portable script
[task 2023-01-31T09:32:50.908Z] 09:30:34 INFO - raptor-browsertime Info: Get visual metrics from the video
[task 2023-01-31T09:32:50.908Z] 09:30:42 INFO - raptor-browsertime Info: Converting video to 60 fps
[task 2023-01-31T09:32:50.908Z] 09:32:42 CRITICAL - raptor-browsertime Critical: Browsertime process timed out after waiting 120 seconds for output
[task 2023-01-31T09:32:50.908Z] 09:32:42 INFO - raptor-browsertime-android Info: removing reverse socket connections
[task 2023-01-31T09:32:50.908Z] 09:32:42 INFO - adb command_output: adb -s R58R10FFRYF wait-for-device reverse --remove-all, timeout: None, timedout: None, exitcode: 0, output:
[task 2023-01-31T09:32:50.908Z] 09:32:42 INFO - adb shell_bool: adb -s R58R10FFRYF wait-for-device shell su -c 'test -d /sdcard/Android/data/org.mozilla.geckoview_example/files/test_root/org.mozilla.geckoview_example-geckodriver-profile/minidumps', timeout: None, timedout: None, exitcode: 1, output:
[task 2023-01-31T09:32:50.908Z] 09:32:42 INFO - raptor-mitmproxy Info: Mitmproxy stop!!
[task 2023-01-31T09:32:50.908Z] 09:32:42 INFO - raptor-mitmproxy Info: Stopping mitmproxy playback, killing process 1153
[task 2023-01-31T09:32:50.908Z] 09:32:43 INFO - raptor-mitmproxy Info: Successfully killed the mitmproxy playback process
[task 2023-01-31T09:32:50.908Z] 09:32:43 INFO - raptor-perftest Info: Removing temporary directory: /tmp/tmp1h38qba7
[task 2023-01-31T09:32:50.908Z] 09:32:43 ERROR - Traceback (most recent call last):
[task 2023-01-31T09:32:50.908Z] 09:32:43 INFO - File "/builds/task_167515628175890/workspace/build/tests/raptor/raptor/raptor.py", line 204, in <module>
[task 2023-01-31T09:32:50.908Z] 09:32:43 INFO - main()
[task 2023-01-31T09:32:50.908Z] 09:32:43 INFO - File "/builds/task_167515628175890/workspace/build/tests/raptor/raptor/raptor.py", line 150, in main
[task 2023-01-31T09:32:50.908Z] 09:32:43 INFO - success = raptor.run_tests(raptor_test_list, raptor_test_names)
[task 2023-01-31T09:32:50.908Z] 09:32:43 INFO - File "/builds/task_167515628175890/workspace/build/tests/raptor/raptor/browsertime/android.py", line 251, in run_tests
[task 2023-01-31T09:32:50.908Z] 09:32:43 INFO - return super(BrowsertimeAndroid, self).run_tests(tests, test_names)
[task 2023-01-31T09:32:50.908Z] 09:32:43 INFO - File "/builds/task_167515628175890/workspace/build/tests/raptor/raptor/perftest.py", line 469, in run_tests
[task 2023-01-31T09:32:50.908Z] 09:32:43 INFO - self.run_test(test, timeout=int(test.get("page_timeout")))
[task 2023-01-31T09:32:50.908Z] 09:32:43 INFO - File "/builds/task_167515628175890/workspace/build/tests/raptor/raptor/browsertime/base.py", line 801, in run_test
[task 2023-01-31T09:32:50.908Z] 09:32:43 INFO - raise Exception(
[task 2023-01-31T09:32:50.908Z] 09:32:43 INFO - Exception: Browsertime process timed out after waiting 120 seconds for output
[task 2023-01-31T09:32:50.908Z] 09:32:43 ERROR - Return code: 1
Andrew, can you take a quick look at this?
Thank you.
Updated•3 years ago
|
Updated•3 years ago
|
Comment 42•3 years ago
|
||
The a51 devices failing in those jobs are 3,6,8,11,15,19,21,22,27. Those are attached to docker hosts and11, and15, and16, and18.
I've asked bitbar to restart docker on those hosts.
Comment 43•3 years ago
•
|
||
Bitbar has rebooted the docker hosts.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 46•3 years ago
|
||
There have been 37 total failures in the last 7 days.
Affected platforms are:
android-hw-a51-11-0-aarch64-shippable-qr
macosx1015-64-shippable-qr
linux1804-64-clang-trunk-qr
Recent failure log.
Updated•2 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•2 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 61•2 years ago
|
||
Instagram job is almost permafailing again, even where the job was green before. Backfill range and retriggers.
Hi Andrew! Can you please take a look at this? Maybe it's something related to bitbar again?
Thank you!
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 64•2 years ago
|
||
(In reply to Iulian Moraru from comment #61)
Hi Andrew! Can you please take a look at this? Maybe it's something related to bitbar again?
Thank you!
Sorry for the delay (was on PTO).
The jobs linked look to be on OS X workers. :masterwayz, have we made any changes to those recently?
RE: Bitbar, We haven't made any changes at Bitbar recently. The workers all have a pretty high success rate currently (~98%) with no outliers.
Comment 65•2 years ago
|
||
I have not touched the R8s in a while so I don't see why that would be the cause.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 71•2 years ago
|
||
Hi Andrej! Can you please take a look at this, or maybe redirect this to someone? The fbnav jobs are permafailing now. This happens even where the job was previously green: backfill and retriggers. So far, this is happening on Linux 18.04 x64 WebRender Shippable and OS X 10.15 WebRender Shippable.
Thank you!
Comment hidden (Intermittent Failures Robot) |
Updated•2 years ago
|
Comment 73•2 years ago
|
||
can't reproduce on try.
But based on comment 71, seems like this is an infra related issue since it's failing in the backfills & retrigger link?
Actually, looking at autoland and central today, fbnav doesn't seem to be perma failing anymore. This might be resolved. Will keep ni? and monitor for the next week
Comment hidden (Intermittent Failures Robot) |
Updated•2 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•2 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 95•2 years ago
|
||
Hi Kash! Can you please take a look at this?
The imgur jobs seem to be permafailing on mozilla-beta and release. They also fail on backfills where the job was previously passed.
In the links that I shared above, you can see the last time they were green on beta and release.
Thank you!
Comment hidden (Intermittent Failures Robot) |
Comment 97•2 years ago
|
||
Iulian this might be resolved in uplifting https://bugzilla.mozilla.org/show_bug.cgi?id=1863130
I will look into doing just that
Comment 98•2 years ago
|
||
(In reply to Kash Shampur [:kshampur] ⌚EST from comment #97)
Iulian this might be resolved in uplifting https://bugzilla.mozilla.org/show_bug.cgi?id=1863130
I will look into doing just that
uplifting presents some complications due to some perfdoc conflicts
it might be best to just wait this out to make it's way to beta/release
:sparky looked into this https://bugzilla.mozilla.org/show_bug.cgi?id=1863130#c16
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 106•2 years ago
•
|
||
This is frequently failing lately, alongside 1873581. Alex, could you have a look?
Updated•2 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment 108•2 years ago
|
||
Hi Cosmin, could you backfill some of these tests ? Hopefully, we can pinpoint a culprit commit in this range.
Updated•2 years ago
|
Comment 109•2 years ago
|
||
Backfills: https://treeherder.mozilla.org/jobs?repo=autoland&group_state=expanded&fromchange=db832cc2d4e687b96654cf986f58183ba6ec8190&searchStr=linux%2C18.04%2Cx64%2Cwebrender%2Cshippable%2Copt%2Cbrowsertime%2Cperformance%2Ctests%2Con%2Cfirefox%2Ctest-linux1804-64-shippable-qr%2Fopt-browsertime-tp6-essential-firefox-instagram%2Cinstagram&tochange=ab59c4318ce5bd207b3c2852f7da152796d5bf06&selectedTaskRun=dpZsnGwbQ7yGe-dOh7e4tQ.0 I can't tell how far this can go. I went back until 04.01.2024 and the regression is not in this range and I don't know if this is caused by a commit or something else.
Comment hidden (Intermittent Failures Robot) |
![]() |
||
Comment 111•2 years ago
|
||
The following machines fail 60-70% of the tasks assigned to them and mostly log this error:
t-linux64-ms-003
t-linux64-ms-004
t-linux64-ms-005
t-linux64-ms-006
t-linux64-ms-007
t-linux64-ms-008
t-linux64-ms-009
t-linux64-ms-010
t-linux64-ms-011
t-linux64-ms-012
The machines in the pool are numbered 001..240. Has the affected range seen modifications before, or have higher numbers been set up differently?
Comment 112•2 years ago
|
||
It looks like the recent failures in this bug are also related (they have the same machine numbers): https://bugzilla.mozilla.org/show_bug.cgi?id=1870979
Comment 113•2 years ago
|
||
Comment 111 is also true for Bug 1873581, which appeared in the same timeframe.
Comment 114•2 years ago
|
||
(In reply to Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout) from comment #111)
The following machines fail 60-70% of the tasks assigned to them and mostly log this error:
t-linux64-ms-003
t-linux64-ms-004
t-linux64-ms-005
t-linux64-ms-006
t-linux64-ms-007
t-linux64-ms-008
t-linux64-ms-009
t-linux64-ms-010
t-linux64-ms-011
t-linux64-ms-012
These are all on moon-chassis-1.inband.releng.mdc1.mozilla.com (01-15 are on it). Perhaps we're seeing some hardware failure.
Jonathan and/or Mark, have you noticed any flakiness on the windows instances on this chassis?
The machines in the pool are numbered 001..240. Has the affected range seen modifications before, or have higher numbers been set up differently?
We haven't changed anything recently on the moonshot linux hosts. They have all been configured identically as far as I know.
Should we quarantine these for now? Not sure the impact of losing 10 nodes on queue times... but I guess test failures are worse.
![]() |
||
Comment 115•2 years ago
|
||
Machines mentioned in comment 111 have been quarantined.
Comment 116•2 years ago
|
||
I've created https://mozilla-hub.atlassian.net/browse/RELOPS-806 to track the work to get these 10 workers fixed.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 122•2 years ago
|
||
This has started to be very frequent again.
The large amount of failures happen on this machines:
- t-linux64-ms-091 - 13 times
- t-linux64-ms-060 - 7 times
- t-linux64-ms-059 - 10 times
- t-linux64-ms-058 - 9 times
- t-linux64-ms-057 - 13 times
- t-linux64-ms-056 - 8 times
- t-linux64-ms-054 - 6 times
- t-linux64-ms-053 - 8 times
- t-linux64-ms-052 - 13 times
- t-linux64-ms-051 - 16 times
- t-linux64-ms-050 - 10 times
- t-linux64-ms-049 - 8 times
- t-linux64-ms-048 - 5 times
- t-linux64-ms-046 - 11 times
- t-linux64-ms-015 - 7 times
- t-linux64-ms-014 - 12 times
- t-linux64-ms-013 - 6 times
NOTE: the number of times that the failure happen on the machines I've mentioned, is accurate at the time I wrote this comment.
Hi Greg, Andrew! Can you please take a look at this?
Thank you!
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 127•2 years ago
|
||
Hi Greg. Can you please take a look over this failure?
There are ~1200 failures in the last 7 days
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 130•2 years ago
|
||
It looks like the machines that are failing are quite consistent. I'll ping :aerickson about it.
Comment 131•2 years ago
•
|
||
17 hosts quarantined.
t-linux64-ms-091,t-linux64-ms-060,t-linux64-ms-059,t-linux64-ms-058,t-linux64-ms-057,t-linux64-ms-056,t-linux64-ms-054,t-linux64-ms-053,t-linux64-ms-052,t-linux64-ms-051,t-linux64-ms-050,t-linux64-ms-049,t-linux64-ms-048,t-linux64-ms-046,t-linux64-ms-015,t-linux64-ms-014,t-linux64-ms-013
Will update https://mozilla-hub.atlassian.net/browse/RELOPS-806.
This pool is getting sort of small... we'll need to figure out these issues soon.
Comment 132•2 years ago
|
||
Lifting quarantine on the following hosts to test with :sparky's linked bug above.
t-linux64-ms-277,t-linux64-ms-278,t-linux64-ms-279,t-linux64-ms-280
Comment hidden (Intermittent Failures Robot) |
Comment 134•2 years ago
|
||
The hosts above were quarantined due to their firmware being newer (was throwing off results). I've requarantined them and lifted quarantine on the following (to test sparky's PR further).
t-linux64-ms-091,t-linux64-ms-060,t-linux64-ms-059,t-linux64-ms-058,t-linux64-ms-057,t-linux64-ms-056,t-linux64-ms-054,t-linux64-ms-053
Comment 135•2 years ago
|
||
:sparky has requested that I lift the quarantine on the rest of the hosts to see if his fix is working.
Lifted on the following:
t-linux64-ms-003,t-linux64-ms-004,t-linux64-ms-005,t-linux64-ms-006,t-linux64-ms-007,t-linux64-ms-008,t-linux64-ms-009,t-linux64-ms-010,t-linux64-ms-011,t-linux64-ms-012,t-linux64-ms-013,t-linux64-ms-014,t-linux64-ms-015,t-linux64-ms-046,t-linux64-ms-047,t-linux64-ms-048,t-linux64-ms-049,t-linux64-ms-050,t-linux64-ms-051,t-linux64-ms-052
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•1 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment 140•1 years ago
|
||
update: Still looking into this.
Unable to reproduce this myself reliably.
I noticed Client TLS handshake failed. The client does not trust the proxy's certificate for ...
in the mitmproxy log but that should've been taken care of in Bug 1877314 so it is probably something else entirely
Comment hidden (Intermittent Failures Robot) |
Comment 142•1 years ago
|
||
:aerickson I've tried reproducing this Mac issue on my Try push with mozscreenshots enabled.
Now I am not sure if this is the cause of the error, but for the couple that failed with the same error, as this bug, I noticed these pop ups in the right hand corner. It might be the cause of interference.
Though I don't see a pattern in the machine numbers so I am not sure if it is occurring on all of them...
Would you know how to dismiss these popups?
Comment 143•1 years ago
|
||
I feel like we've done some work to suppress those notices, but not exactly sure. Redirecting to our Mac expert Ryan.
Comment 144•1 years ago
|
||
Created a configuration profile to suppress the "Get to know your Mac" and "Tips" notifications. Deployed to r8/m1 staging pools to test.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 147•1 years ago
|
||
Getting this working will not be an easy lift. Cut a ticket here: https://mozilla-hub.atlassian.net/browse/RELOPS-871
Comment 148•1 year ago
|
||
great, thank you for the update!
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 152•1 year ago
|
||
Update:
There have been 58 total failures within the last 7 days:
- 1 failure on Android 11.0 Samsung A51 AArch64 Shippable opt
- 1 failure on linux1804-64-clang-trunk-qr opt
- 15 failures on Linux 18.04 x64 WebRender Shippable opt
- 41 failures on OS X 10.15 WebRender Shippable opt
Recent failure: https://treeherder.mozilla.org/logviewer?job_id=452761865&repo=mozilla-central&lineNumber=1062
[task 2024-03-29T02:12:35.976Z] 02:12:35 INFO - TEST-INFO | screencapture: exit 0
[task 2024-03-29T02:12:35.976Z] 02:12:35 CRITICAL - raptor-browsertime Critical: Browsertime process timed out after waiting 120 seconds for output
[task 2024-03-29T02:12:35.977Z] 02:12:35 INFO - raptor-mitmproxy Info: MitmproxyDesktop stop!!
[task 2024-03-29T02:12:35.977Z] 02:12:35 INFO - raptor-mitmproxy Info: Mitmproxy stop!!
[task 2024-03-29T02:12:35.977Z] 02:12:35 INFO - raptor-mitmproxy Info: Stopping mitmproxy playback, killing process 1454
[task 2024-03-29T02:12:36.177Z] 02:12:36 INFO - raptor-mitmproxy Info: Successfully killed the mitmproxy playback process
[task 2024-03-29T02:12:36.177Z] 02:12:36 INFO - raptor-mitmproxy Info: Turning off the browser proxy
[task 2024-03-29T02:12:36.178Z] 02:12:36 INFO - raptor-mitmproxy Info: writing: /opt/worker/tasks/task_171167710726619/build/application/Firefox Nightly.app/Contents/Resources/distribution/policies.json
[task 2024-03-29T02:12:36.178Z] 02:12:36 INFO - raptor-perftest Info: Removing temporary directory: /var/folders/8y/lwsprh294bb8w7d_ymy6dtfw000014/T/tmpa6pp0pej
[task 2024-03-29T02:12:36.191Z] 02:12:36 ERROR - Traceback (most recent call last):
[task 2024-03-29T02:12:36.191Z] 02:12:36 INFO - File "/opt/worker/tasks/task_171167710726619/build/tests/raptor/raptor/raptor.py", line 188, in <module>
[task 2024-03-29T02:12:36.191Z] 02:12:36 INFO - main()
[task 2024-03-29T02:12:36.191Z] 02:12:36 INFO - File "/opt/worker/tasks/task_171167710726619/build/tests/raptor/raptor/raptor.py", line 137, in main
[task 2024-03-29T02:12:36.191Z] 02:12:36 INFO - success = raptor.run_tests(raptor_test_list, raptor_test_names)
[task 2024-03-29T02:12:36.192Z] 02:12:36 INFO - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[task 2024-03-29T02:12:36.192Z] 02:12:36 INFO - File "/opt/worker/tasks/task_171167710726619/build/tests/raptor/raptor/perftest.py", line 499, in run_tests
[task 2024-03-29T02:12:36.192Z] 02:12:36 INFO - self.run_test(test, timeout=int(test.get("page_timeout")))
[task 2024-03-29T02:12:36.192Z] 02:12:36 INFO - File "/opt/worker/tasks/task_171167710726619/build/tests/raptor/raptor/browsertime/base.py", line 1039, in run_test
[task 2024-03-29T02:12:36.192Z] 02:12:36 INFO - raise Exception(
[task 2024-03-29T02:12:36.192Z] 02:12:36 INFO - Exception: Browsertime process timed out after waiting 120 seconds for output
[task 2024-03-29T02:12:36.236Z] 02:12:36 INFO - Return code: 1
[task 2024-03-29T02:12:36.236Z] 02:12:36 WARNING - setting return code to 1
Comment hidden (Intermittent Failures Robot) |
Comment 154•1 year ago
|
||
I am planning to re-record a few of the tests that are >= 2 failure count. Nearly all of these except imgur are on a mitmproxy version less than 8, so hopefully this helps
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 165•1 year ago
|
||
as discussed in Triage :fbilt will be looking into this for a bit
Comment hidden (Intermittent Failures Robot) |
Updated•1 year ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 173•1 year ago
|
||
Checked one of the failed nodes (macmini-r8-36) and it's reporting 75GB of free space/250GB.
Looks like /opt/worker/downloads/
is 63.6GB and /opt/worker/cache/
is also 63GB
If the tests are failing due to a lack of storage, this most likely is the cause.
Comment hidden (Intermittent Failures Robot) |
Comment 175•1 year ago
|
||
Interesting find Ryan, thanks for looking into that! Would you be able to clean up some of those folders on the machines? Maybe we could start with the downloads folder to see if it helps with the failure.
Comment 176•1 year ago
|
||
While cleaning up r8-36, i ran into:
[root@macmini-r8-36.test.releng.mdc1.mozilla.com worker]# rm -rf downloads/* -sh: /bin/rm: Argument list too long
Ended up doing a:
find downloads/ -mindepth 1 -exec rm -rf {} +
I'll report back if they all follow this pattern.
Comment 177•1 year ago
|
||
Cleaned up:
macmini-r8-36
macmini-r8-147
macmini-r8-79
macmini-r8-259
macmini-r8-66
Lmk if the others need similar treatment :sparky
Comment 178•1 year ago
|
||
Thanks Ryan! I'll monitor over the next week to see if we still get failures on those machines. (Adding ni? for myself).
Comment 179•1 year ago
|
||
Hi Ryan, could we try cleaning up all the mac machines in the same way? I don't see the machines you cleaned up in the recent failures so maybe the clean up helped them.
Comment 180•1 year ago
•
|
||
No problem, :sparky . Our Safe_Runner
tool is quarantining/running cleanup on the 138 hosts
Comment 181•1 year ago
|
||
Nice, thanks Ryan!
Comment 182•1 year ago
|
||
This has been completed. It does look like there were some side effects, despite quarantining the hosts:
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 191•11 months ago
|
||
PR with fix for macOS: https://github.com/mozilla-platform-ops/ronin_puppet/pull/738
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 196•11 months ago
|
||
Hi! Can someone please take a look at this? It started permafailing on this merge to central and it is now failing where it was green before. Could this be another issue with workers?
Thank you!
Comment 197•11 months ago
|
||
It seems that this has actually recovered -> green retriggers. Not sure what happened here, maybe it was just a hiccup or it only affects certain workers?
Comment 198•11 months ago
|
||
Odd, but also historically we've always had issues with imgur having phases like this
Comment 199•10 months ago
|
||
Looking at the artifacts, it's failing during visual metrics calculations which we don't see that often. It might have been something going on with the machines.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 206•10 months ago
|
||
After landing Bug 1920821 I am noticing much less reported mac failures. I will keep my ni? for at least one more week just to monitor it
Comment hidden (Intermittent Failures Robot) |
Comment 208•9 months ago
|
||
(In reply to Kash Shampur [:kshampur] ⌚EST from comment #206)
After landing Bug 1920821 I am noticing much less reported mac failures. I will keep my ni? for at least one more week just to monitor it
This is looking much better now, so canceling ni?
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Description
•