osx tier-3 browsertime chrome/chromium pageload tests are failing to run on new osx 10.15 machines
Categories
(Testing :: Raptor, defect, P2)
Tracking
(firefox-esr78 unaffected, firefox-esr91 unaffected, firefox86 unaffected, firefox87 unaffected, firefox88 wontfix, firefox92 wontfix, firefox93 wontfix, firefox94 fixed)
Tracking | Status | |
---|---|---|
firefox-esr78 | --- | unaffected |
firefox-esr91 | --- | unaffected |
firefox86 | --- | unaffected |
firefox87 | --- | unaffected |
firefox88 | --- | wontfix |
firefox92 | --- | wontfix |
firefox93 | --- | wontfix |
firefox94 | --- | fixed |
People
(Reporter: jmaher, Assigned: dhouse)
References
(Regression)
Details
(Keywords: regression)
Attachments
(1 file)
pageload tests for chrome/chromium browsers via browsertime are failing on osx:
https://treeherder.mozilla.org/jobs?repo=try&tier=1%2C2%2C3&revision=8ce6e030833655e82d0e576cd937d442b4178b12
these are tier-3, but are expected to run. I have a patch to turn these on.
A few errors from the log:
[task 2021-03-18T05:21:27.486Z] 05:21:27 INFO - raptor-perftest Info: main raptor init, config is: {'verbose': False, 'e10s': True, 'app': 'chromium', 'symbols_path': 'https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/btBYMy-jR4eSH7-ATzWB1A/artifacts/public/build/target.crashreporter-symbols.zip', 'power_test': False, 'run_local': False, 'chimera': True, 'cold': True, 'gecko_profile_interval': None, 'binary': '/opt/worker/tasks/task_161604478481098/fetches/chrome-mac/Chromium.app/Contents/MacOS/Chromium', 'enable_control_server_wait': False, 'no_conditioned_profile': True, 'environment': {'MOZ_WEBRENDER': '1'}, 'platform': 'mac', 'conditioned_profile_scenario': 'settled', 'extra_prefs': {}, 'gecko_profile': False, 'gecko_profile_entries': None, 'disable_perf_tuning': False, 'host': '127.0.0.1', 'enable_fission': False, 'memory_test': False, 'enable_webrender': True, 'obj_path': None, 'live_sites': False, 'device_name': None, 'project': 'mozilla-central', 'cpu_test': False, 'is_release_build': False, 'using_condprof': False, 'processor': 'x86_64'}
[task 2021-03-18T05:21:27.488Z] 05:21:27 INFO - raptor-perftest Info: Merging profile: /opt/worker/tasks/task_161604478481098/build/tests/raptor/raptor/profile_data/base
[task 2021-03-18T05:21:27.490Z] 05:21:27 INFO - raptor-perftest Info: Merging profile: /opt/worker/tasks/task_161604478481098/build/tests/raptor/raptor/profile_data/common
[task 2021-03-18T05:21:27.490Z] 05:21:27 INFO - raptor-perftest Info: Merging profile: /opt/worker/tasks/task_161604478481098/build/tests/raptor/raptor/profile_data/perf
[task 2021-03-18T05:21:27.490Z] 05:21:27 INFO - raptor-perftest Info: Merging profile: /opt/worker/tasks/task_161604478481098/build/tests/raptor/raptor/profile_data/raptor
[task 2021-03-18T05:21:27.490Z] 05:21:27 INFO - raptor-perftest Info: Local browser profile: /var/folders/l1/qh5x5zxd1j36jvq5yg7nn4r0000014/T/tmpoBU9KC.mozrunner/Default
[task 2021-03-18T05:21:27.491Z] 05:21:27 INFO - raptor-browsertime Info: Removing mozprofile delimiters from browser profile
[task 2021-03-18T05:21:27.491Z] 05:21:27 CRITICAL - raptor-browsertime Critical: [chromium] Exception [Errno 2] No such file or directory: '/var/folders/l1/qh5x5zxd1j36jvq5yg7nn4r0000014/T/tmpoBU9KC.mozrunner/Default/user.js' while removing mozprofile delimiters
[task 2021-03-18T05:21:27.491Z] 05:21:27 INFO - raptor-browsertime Info: cwd: '/opt/worker/tasks/task_161604478481098/build'
I think ^ is not a real error, but this next one looks more problematic is:
task 2021-03-18T05:21:28.242Z] 05:21:28 INFO - raptor-mitmproxy Info: INFO - File mitmproxy-5.1.1-osx.tar.gz fetched from https://tooltool.mozilla-releng.net/ as /opt/worker/tasks/task_161604478481098/testing/mozproxy/mitmdump-5.1.1/tmp8K2VXn
[task 2021-03-18T05:21:28.268Z] 05:21:28 INFO - raptor-mitmproxy Info: INFO - File integrity verified, renaming tmp8K2VXn to mitmproxy-5.1.1-osx.tar.gz
[task 2021-03-18T05:21:28.268Z] 05:21:28 INFO - raptor-mitmproxy Info: INFO - Updating local cache /builds/tooltool_cache...
[task 2021-03-18T05:21:28.268Z] 05:21:28 INFO - raptor-mitmproxy Info: INFO - Creating cache in /builds/tooltool_cache...
[task 2021-03-18T05:21:28.268Z] 05:21:28 INFO - raptor-mitmproxy Info: WARNING - Impossible to add file mitmproxy-5.1.1-osx.tar.gz to cache folder /builds/tooltool_cache
[task 2021-03-18T05:21:28.268Z] 05:21:28 ERROR - raptor-mitmproxy Info: Traceback (most recent call last):
[task 2021-03-18T05:21:28.268Z] 05:21:28 INFO - raptor-mitmproxy Info: File "/opt/worker/tasks/task_161604478481098/mozharness/external_tools/tooltool.py", line 1159, in fetch_files
[task 2021-03-18T05:21:28.268Z] 05:21:28 INFO - raptor-mitmproxy Info: os.makedirs(cache_folder, 0o0700)
[task 2021-03-18T05:21:28.268Z] 05:21:28 INFO - raptor-mitmproxy Info: File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/os.py", line 150, in makedirs
[task 2021-03-18T05:21:28.268Z] 05:21:28 INFO - raptor-mitmproxy Info: makedirs(head, mode)
[task 2021-03-18T05:21:28.268Z] 05:21:28 INFO - raptor-mitmproxy Info: File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/os.py", line 157, in makedirs
[task 2021-03-18T05:21:28.269Z] 05:21:28 INFO - raptor-mitmproxy Info: mkdir(name, mode)
[task 2021-03-18T05:21:28.269Z] 05:21:28 INFO - raptor-mitmproxy Info: OSError: [Errno 30] Read-only file system: '/builds'
[task 2021-03-18T05:21:28.269Z] 05:21:28 INFO - raptor-mitmproxy Info: INFO - untarring "mitmproxy-5.1.1-osx.tar.gz"
[task 2021-03-18T05:21:28.423Z] 05:21:28 INFO - raptor-mitmproxy Info: downloading mitmproxy pageset
[task 2021-03-18T05:21:28.528Z] 05:21:28 INFO - raptor-mitmproxy Info: INFO - File mitm5-linux-firefox-amazon.zip not present in local cache folder /builds/tooltool_cache
[task 2021-03-18T05:21:28.528Z] 05:21:28 INFO - raptor-mitmproxy Info: INFO - Attempting to fetch from 'https://tooltool.mozilla-releng.net/'...
[task 2021-03-18T05:21:29.060Z] 05:21:29 INFO - raptor-mitmproxy Info: INFO - File mitm5-linux-firefox-amazon.zip fetched from https://tooltool.mozilla-releng.net/ as /opt/worker/tasks/task_161604478481098/testing/mozproxy/tmpfSOszU
[task 2021-03-18T05:21:29.061Z] 05:21:29 INFO - raptor-mitmproxy Info: INFO - File integrity verified, renaming tmpfSOszU to mitm5-linux-firefox-amazon.zip
[task 2021-03-18T05:21:29.062Z] 05:21:29 INFO - raptor-mitmproxy Info: INFO - Updating local cache /builds/tooltool_cache...
[task 2021-03-18T05:21:29.062Z] 05:21:29 INFO - raptor-mitmproxy Info: INFO - Creating cache in /builds/tooltool_cache...
[task 2021-03-18T05:21:29.064Z] 05:21:29 INFO - raptor-mitmproxy Info: WARNING - Impossible to add file mitm5-linux-firefox-amazon.zip to cache folder /builds/tooltool_cache
[task 2021-03-18T05:21:29.065Z] 05:21:29 ERROR - raptor-mitmproxy Info: Traceback (most recent call last):
[task 2021-03-18T05:21:29.065Z] 05:21:29 INFO - raptor-mitmproxy Info: File "/opt/worker/tasks/task_161604478481098/mozharness/external_tools/tooltool.py", line 1159, in fetch_files
[task 2021-03-18T05:21:29.065Z] 05:21:29 INFO - raptor-mitmproxy Info: os.makedirs(cache_folder, 0o0700)
[task 2021-03-18T05:21:29.065Z] 05:21:29 INFO - raptor-mitmproxy Info: File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/os.py", line 150, in makedirs
[task 2021-03-18T05:21:29.065Z] 05:21:29 INFO - raptor-mitmproxy Info: makedirs(head, mode)
[task 2021-03-18T05:21:29.065Z] 05:21:29 INFO - raptor-mitmproxy Info: File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/os.py", line 157, in makedirs
[task 2021-03-18T05:21:29.065Z] 05:21:29 INFO - raptor-mitmproxy Info: mkdir(name, mode)
[task 2021-03-18T05:21:29.065Z] 05:21:29 INFO - raptor-mitmproxy Info: OSError: [Errno 30] Read-only file system: '/builds'
[task 2021-03-18T05:21:29.121Z] 05:21:29 INFO - raptor-mitmproxy Info: Adding /opt/worker/tasks/task_161604478481098/testing/mozproxy/mitm5-linux-firefox-amazon.zip to recording list
^ shows /builds as readonly, but we have configured 'tooltool_cache': '/builds/tooltool_cache'
, so it is using that- but then it self recovers and uses the temp folder location in /opt
Lastly we fail to get results:
[task 2021-03-18T05:21:31.621Z] 05:21:31 INFO - raptor-browsertime Info: PATH: /opt/worker/tasks/task_161604478481098/fetches/ffmpeg-4.1.1-macos64-static/bin:/opt/worker/tasks/task_161604478481098/build/venv/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin
[task 2021-03-18T05:21:32.739Z] 05:21:32 INFO - raptor-browsertime Info: Running tests using Chrome - 25 iteration(s)
[task 2021-03-18T05:21:34.380Z] 05:21:34 INFO - raptor-browsertime Info: Starting a browsertime pageload
[task 2021-03-18T05:21:34.380Z] 05:21:34 INFO - raptor-browsertime Info: Waiting for 30000 ms (post_startup_delay)
[task 2021-03-18T05:22:04.432Z] 05:22:04 INFO - raptor-browsertime Info: Navigating to about:blank
[task 2021-03-18T05:22:04.433Z] 05:22:04 INFO - raptor-browsertime Info: Navigating to url about:blank iteration 1
[task 2021-03-18T05:22:09.527Z] 05:22:09 INFO - raptor-browsertime Info: Cycle 0, waiting for 1000 ms
[task 2021-03-18T05:22:10.570Z] 05:22:10 INFO - raptor-browsertime Info: Cycle 0, starting the measure
[task 2021-03-18T05:22:10.570Z] 05:22:10 INFO - raptor-browsertime Info: Testing url https://www.amazon.com/s?k=laptop&ref=nb_sb_noss_1 iteration 1
[task 2021-03-18T05:24:10.638Z] 05:24:10 INFO - raptor-mitmproxy Info: MitmproxyDesktop stop!!
[task 2021-03-18T05:24:10.638Z] 05:24:10 INFO - raptor-mitmproxy Info: Mitmproxy stop!!
[task 2021-03-18T05:24:10.638Z] 05:24:10 INFO - raptor-mitmproxy Info: Stopping mitmproxy playback, killing process 916
[task 2021-03-18T05:24:10.804Z] 05:24:10 INFO - raptor-mitmproxy Info: Successfully killed the mitmproxy playback process
[task 2021-03-18T05:24:10.804Z] 05:24:10 INFO - perftest-results-handler Info: retrieving browsertime test results
[task 2021-03-18T05:24:10.804Z] 05:24:10 INFO - perftest-results-handler Critical: [chromium] unable to find browsertime results at /opt/worker/tasks/task_161604478481098/build/blobber_upload_dir/browsertime-results/amazon/browsertime.json
[task 2021-03-18T05:24:10.804Z] 05:24:10 ERROR - raptor-main Critical: [chromium] TEST-UNEXPECTED-FAIL: no raptor test results were found for amazon
[task 2021-03-18T05:24:10.918Z] 05:24:10 ERROR - Return code: 1
I don't know why we fail to get results. Pageload tests work on firefox browser in the referenced try push
as a side note, these were accidentally turned off on Monday March 8th, but prior to that they were running in automation with the same failures.
We know that chrome/chromium works due to raptor results, we know that pageload works due to firefox results, I am not sure what is broken with browsertime chrome/chromium pageload. These also worked well on 10.14 machines.
Reporter | ||
Comment 1•3 years ago
|
||
:sparky, I am not sure of how to interpret all of this, do you have a recommendation for where to hack on this and/or debug this?
Comment 2•3 years ago
•
|
||
Hmm, it looks like browsertime is having trouble making the url request based on the mitmproxy log which shows that the amazon url is never requested (https://firefoxci.taskcluster-artifacts.net/Fd9JAxMJT9ezQYhI4ZF_8g/0/public/test_info/mitmproxy.log). You could try to get more information from it by providing the --verbose argument to the tasks. Also, you can remove this line so we don't immediately kill browsertime to get more information: https://searchfox.org/mozilla-central/source/testing/raptor/raptor/browsertime/base.py#425
Reporter | ||
Comment 3•3 years ago
|
||
[task 2021-03-18T22:09:26.779Z] 22:09:26 INFO - raptor-browsertime Info: [browsertime] PageCompleteCheck returned true
[task 2021-03-18T22:09:26.779Z] 22:09:26 INFO - raptor-browsertime Info: [browsertime] Cycle 0, waiting for 1000 ms
[task 2021-03-18T22:09:27.785Z] 22:09:27 INFO - raptor-browsertime Info: [browsertime] Cycle 0, starting the measure
[task 2021-03-18T22:09:27.786Z] 22:09:27 INFO - raptor-browsertime Info: [browsertime.command.measure] Testing url https://www.amazon.com/s?k=laptop&ref=nb_sb_noss_1 iteration 1
[task 2021-03-18T22:09:27.786Z] 22:09:27 INFO - raptor-browsertime Info: [browsertime.video] Add orange color
[task 2021-03-18T22:09:27.807Z] 22:09:27 INFO - raptor-browsertime Info: [browsertime.video] Start FFMPEG with ["-hide_banner","-video_size","1024x768","-f","x11grab","-framerate",30,"-probesize","10M","-y","-draw_mouse","0","-i",":0.0+0,0","-codec:v","libx264rgb","-crf",0,"-preset","ultrafast","-vf","pad=ceil(iw/2)*2:ceil(ih/2)*2","/home/cltbld/tasks/task_161610527768212/build/blobber_upload_dir/browsertime-results/amazon/x11-tmp.mp4"]
[task 2021-03-18T22:09:27.956Z] 22:09:27 INFO - raptor-browsertime Info: [browsertime.video] Input #0, x11grab, from ':0.0+0,0':
[task 2021-03-18T22:09:27.956Z] 22:09:27 INFO - raptor-browsertime Info: Duration: N/A, start: 1616105367.841895, bitrate: N/A
[task 2021-03-18T22:09:27.957Z] 22:09:27 INFO - raptor-browsertime Info:
[task 2021-03-18T22:09:27.957Z] 22:09:27 INFO - raptor-browsertime Info: at Socket.readableStream.on.data (/home/cltbld/tasks/task_161610527768212/fetches/browsertime/node_modules/browsertime/lib/video/screenRecording/desktop/ffmpegRecorder.js:76:13)
[task 2021-03-18T22:09:27.957Z] 22:09:27 INFO - raptor-browsertime Info: at Socket.emit (events.js:203:15)
[task 2021-03-18T22:09:27.958Z] 22:09:27 INFO - raptor-browsertime Info: at addChunk (_stream_readable.js:288:12)
[task 2021-03-18T22:09:27.958Z] 22:09:27 INFO - raptor-browsertime Info: at readableAddChunk (_stream_readable.js:269:11)
[task 2021-03-18T22:09:27.958Z] 22:09:27 INFO - raptor-browsertime Info: at Socket.Readable.push (_stream_readable.js:224:10)
[task 2021-03-18T22:09:27.959Z] 22:09:27 INFO - raptor-browsertime Info: at Pipe.onStreamRead [as onread] (internal/stream_base_commons.js:94:17)
[task 2021-03-18T22:09:27.959Z] 22:09:27 INFO - raptor-browsertime Info: [browsertime.video] Stream #0:0: Video: rawvideo (BGR[0] / 0x524742), bgr0, 1024x768, 30 fps, 30 tbr, 1000k tbn, 1000k tbc
[task 2021-03-18T22:09:27.959Z] 22:09:27 INFO - raptor-browsertime Info: Stream mapping:
[task 2021-03-18T22:09:27.959Z] 22:09:27 INFO - raptor-browsertime Info: Stream #0:0 -> #0:0 (rawvideo (native) -> h264 (libx264rgb))
[task 2021-03-18T22:09:27.959Z] 22:09:27 INFO - raptor-browsertime Info: Press [q] to stop, [?] for help
[task 2021-03-18T22:09:27.959Z] 22:09:27 INFO - raptor-browsertime Info:
[task 2021-03-18T22:09:27.959Z] 22:09:27 INFO - raptor-browsertime Info: at Socket.readableStream.on.data (/home/cltbld/tasks/task_161610527768212/fetches/browsertime/node_modules/browsertime/lib/video/screenRecording/desktop/ffmpegRecorder.js:76:13)
[task 2021-03-18T22:09:27.959Z] 22:09:27 INFO - raptor-browsertime Info: at Socket.emit (events.js:203:15)
[task 2021-03-18T22:09:27.959Z] 22:09:27 INFO - raptor-browsertime Info: at addChunk (_stream_readable.js:288:12)
[task 2021-03-18T22:09:27.960Z] 22:09:27 INFO - raptor-browsertime Info: at readableAddChunk (_stream_readable.js:269:11)
[task 2021-03-18T22:09:27.960Z] 22:09:27 INFO - raptor-browsertime Info: at Socket.Readable.push (_stream_readable.js:224:10)
[task 2021-03-18T22:09:27.960Z] 22:09:27 INFO - raptor-browsertime Info: at Pipe.onStreamRead [as onread] (internal/stream_base_commons.js:94:17)
[task 2021-03-18T22:09:27.961Z] 22:09:27 INFO - raptor-browsertime Info: [browsertime.video] Started ffmpeg with -hide_banner -video_size 1024x768 -f x11grab -framerate 30 -probesize 10M -y -draw_mouse 0 -i :0.0+0,0 -codec:v libx264rgb -crf 0 -preset ultrafast -vf pad=ceil(iw/2)*2:ceil(ih/2)*2 /home/cltbld/tasks/task_161610527768212/build/blobber_upload_dir/browsertime-results/amazon/x11-tmp.mp4
could ffmpeg need some library or dependency which isn't there on the new 10.15 machines? browsertime pageload seems to work for firefox, just not chrome/chromium, so I suspect software works, just some config/compatibility.
:sparky, any further ideas?
Comment 4•3 years ago
|
||
Set release status flags based on info from the regressing bug 1694185
Comment 5•3 years ago
|
||
(In reply to Joel Maher ( :jmaher ) (UTC -0800) from comment #3)
could ffmpeg need some library or dependency which isn't there on the new 10.15 machines? browsertime pageload seems to work for firefox, just not chrome/chromium, so I suspect software works, just some config/compatibility.
:sparky, any further ideas?
Hmm, it would be good to check it ffmpeg works on Firefox, you can do this by passing --browsertime-no-ffwindowrecorder
. Based on the artifacts that I'm seeing in the chrome run, it should work fine.
That said, I think the issue is coming from here:
[task 2021-03-18T22:09:52.979Z] 22:09:52 INFO - raptor-browsertime Info: [browsertime.video] Writing to /home/cltbld/tasks/task_161610527768212/build/blobber_upload_dir/browsertime-results/amazon/pages/www_amazon_com/s/query-79238515/data/video/1.mp4
[task 2021-03-18T22:09:52.979Z] 22:09:52 INFO - raptor-browsertime Info: events.js:174
[task 2021-03-18T22:09:52.979Z] 22:09:52 INFO - raptor-browsertime Info: throw er; // Unhandled 'error' event
[task 2021-03-18T22:09:52.979Z] 22:09:52 INFO - raptor-browsertime Info: ^
[task 2021-03-18T22:09:52.979Z] 22:09:52 INFO - raptor-browsertime Info:
[task 2021-03-18T22:09:52.979Z] 22:09:52 INFO - raptor-browsertime Info: Error: ENOENT: no such file or directory, open '/home/cltbld/tasks/task_161610527768212/build/blobber_upload_dir/browsertime-results/amazon/chromedriver.log'
[task 2021-03-18T22:09:52.979Z] 22:09:52 INFO - raptor-browsertime Info: Emitted 'error' event at:
[task 2021-03-18T22:09:52.979Z] 22:09:52 INFO - raptor-browsertime Info: at lazyFs.open (internal/fs/streams.js:115:12)
[task 2021-03-18T22:09:52.979Z] 22:09:52 INFO - raptor-browsertime Info: at FSReqWrap.args [as oncomplete] (fs.js:140:20)
Browsertime is searching for a chromedriver.log file, but it doesn't exist. It's actually called chromedriver-1.log.gz
now, though it's possible that browsertime renamed it to this so maybe this is a race condition. That said, when I try to download the chromedriver.log artifact from here, I am hit with an access denied error, maybe this is related to the /builds
read-only issue? https://firefoxci.taskcluster-artifacts.net/A86xu0mzRLO4N_sQOaC_bg/0/public/test_info/browsertime-results/amazon/chromedriver-1.log.gz
Updated•3 years ago
|
Reporter | ||
Comment 6•3 years ago
|
||
I am not sure how to debug this. Can you show me where in the code chromedriver.log is created?
Comment 7•3 years ago
|
||
The chromedriver.log file is created by chromedriver, but browsertime modifies it here: https://github.com/sitespeedio/browsertime/blob/38dbc1c1285773ef571938ea94de2b62243c64d8/lib/chrome/webdriver/chromium.js#L162-L163
Comment hidden (Intermittent Failures Robot) |
Comment 10•3 years ago
|
||
:sparky could you provide an update? there's interest in having macOS firefox/chrome comparisons ahead of MR2 so we may need to increase the priority here. Alternatively, I know :jmaher is restoring some of the older Apple hardware for fission so we could use that, but I'd prefer to get this fixed and avoid taking that step back.
Updated•3 years ago
|
Comment 11•3 years ago
|
||
Increasing the priority as we've been without Chrome/Chromium results on macOS for a couple of months now. I haven't had time to access the mac mini loaner, and I'm also not entirely sure what I'd do to troubleshoot this.
:jmaher are there any OS updates we could apply to see if they help? Are there any macOS experts that might be able to help us here?
Comment 12•3 years ago
•
|
||
I'm pretty sure the issue is that we need to accept some screen capture permissions for FFMPEG. This is the command we need to run to enable that permission programatically:
sudo sqlite3 "/Library/Application Support/com.apple.TCC/TCC.db" "insert into access (service, client, client_type, allowed, prompt_count, indirect_object_identifier, flags) values ('kTCCServiceScreenCapture', '/bin/bash', 1, 1, 1, 'UNUSED', 0)"
Reporter | ||
Comment 13•3 years ago
|
||
:dividehex, can you run the sqlite command from comment 12 on the osx R8 pool?
Comment 14•3 years ago
|
||
This is a no-go on the r8s
sudo sqlite3 "/Library/Application Support/com.apple.TCC/TCC.db" "insert into access (service, client, client_type, allowed, prompt_count, indirect_object_identifier, flags) values ('kTCCServiceScreenCapture', '/bin/bash', 1, 1, 1, 'UNUSED', 0)"
Error: attempt to write a readonly database
Reporter | ||
Comment 15•3 years ago
|
||
:sparky- this isn't working on our machines- should we do more investigation, or do you have a backup technique
Comment 16•3 years ago
|
||
:dividehex, I think that command might need to be run in the recovery mode terminal.
I'm not sure what else we could do, I'd have to look around some more. Alternatively, if someone knows of something else we could try, what we're trying to do here with that command is make sure that we don't need to obtain user permission to capture/record the screen.
Comment 17•3 years ago
|
||
I'll do a little more research into this but the short of it is, Apple requires an actually Person to allow application control over various things like screen capture, mic, camera, etc. This command might actually work if I login through the GUI and execute it
There is currently no way to whitelist apps preemptively via puppet or MDM. Apple only allows you to administratively deny apps preemptively.
It is very frustrating and we aren't the only ones that suffer from this Apple enforced security policy.
I did find a script that has a little more info on it and we might be able to do something on initial deployment in order to get this to work. I'll continue to look into this but mind you I am pulled in many directions these days.
https://github.com/bp88/JSS-Scripts/blob/master/TCC.db%20Modifier.sh
Comment 18•3 years ago
•
|
||
(In reply to Greg Mierzwinski [:sparky] from comment #16)
:dividehex, I think that command might need to be run in the recovery mode terminal.
I'm not sure what else we could do, I'd have to look around some more. Alternatively, if someone knows of something else we could try, what we're trying to do here with that command is make sure that we don't need to obtain user permission to capture/record the screen.
I don't think this will work in recovery because the filesystem this lives on is still read-only. We can still try it though.
Comment 19•3 years ago
|
||
(In reply to Jake Watkins [:dividehex] from comment #18)
(In reply to Greg Mierzwinski [:sparky] from comment #16)
:dividehex, I think that command might need to be run in the recovery mode terminal.
I'm not sure what else we could do, I'd have to look around some more. Alternatively, if someone knows of something else we could try, what we're trying to do here with that command is make sure that we don't need to obtain user permission to capture/record the screen.
I don't think this will work in recovery because the filesystem this lives on is still read-only. We can still try it though.
:dividehex do you have an update?
Updated•3 years ago
|
Comment 20•3 years ago
|
||
:sparky could we temporarily run the tests without visual metrics on macOS?
Comment 22•3 years ago
|
||
(In reply to Dave Hunt [:davehunt] [he/him] ⌚GMT from comment #19)
:dividehex do you have an update?
I haven't had a chance to work on this but I'll pick it back up on Friday. Sorry for the delay.
Comment 23•3 years ago
|
||
I've done some testing with changing SIP and executing the direct TCC.db sql injection. Here are my findings:
- TCC.db is protected from direct writes. It gets updated through a protected OS api. This is how mdm profiles are able to make changes within Apples limits and restrictions.
- Preemptively whitelisting application in TCC with MDM is still not allowed for Camera, Microphone, and Screencapture per Apple's profile documentation
- Apple really wants a human user to approve, via the GUI, applications trying to access the Camera, Microphone or Screencapture
- The TCC.db can be directly written while booted in Recovery mode
- It can also be written to if SIP is disabled. We do not disable SIP on Apple Mac Minis since this significantly reduces the security of the host.
With that said, it is possible for us to preemptively inject the application whitelist using the recovery mode. Since this is going to require worker downtime and coordination with datacenter remote hands, I want to verify this actually achieves the desired result and subsequent solution.
Is /bin/bash
the correct application to allow
?
If I setup a loaner mac mini with this TCC.db change done, is someone available to test it works before rolling out to the rest of the workers?
Reporter | ||
Comment 24•3 years ago
|
||
:dividehex, I can work with :davehunt/etc. to test on a loaner. could you instead do this and create a pool t-osx-r8-qa or -qa1, etc. Then I could test on try and if needed get in via vnc/ssh to fix stuff.
Comment 25•3 years ago
|
||
I'm going to allocate macmini-r8-126.test.releng.mdc1.mozilla.com
for testing. I'll let you know the pool id as soon as QTS has finished the recovery mode/TCC.db insert.
Comment 26•3 years ago
|
||
QTS has disabled SIP on the loaner and I've handed it over to jmaher to test on.
Per slack convo with jmaher:
anyway, I set the TCC.db allow for the ScreenCapture service
10:17
-----------------------------------------------------------------------
Service | Client
-----------------------------------------------------------------------
kTCCServiceAccessibility | /System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/AE.framework/Versions/A/Support/AEServer
kTCCServiceDeveloperTool | com.apple.Terminal
kTCCServicePostEvent | com.apple.screensharing.agent
kTCCServiceScreenCapture | /bin/bash
kTCCServiceScreenCapture | com.apple.screensharing.agent
kTCCServiceSystemPolicyAllFiles | /usr/libexec/sshd-keygen-wrapper
kTCCServiceSystemPolicyAllFiles | /usr/sbin/sshd
10:18
and I had qts disable SIP
[jwatkins@macmini-r8-126.test.releng.mdc1.mozilla.com ~]$ csrutil status
System Integrity Protection status: disabled.
10:19
That will allow you to manipulate the TCC.db directly without having to go into the recovery mode
10:20
I don't want to make disabling SIP the solution here but if you find an tcc exception that works, we can have qts go into recovery for the rest of the r8s and set it without having to completely disable SIP
Reporter | ||
Comment 27•3 years ago
|
||
I have verified via vnc and taskcluster that:
sudo sqlite3 "/Library/Application Support/com.apple.TCC/TCC.db" "select * from access where service='kTCCServiceScreenCapture'"
kTCCServiceScreenCapture|/bin/bash|1|1|1||||UNUSED||0|1629156426
kTCCServiceScreenCapture|com.apple.Terminal|0|1|1||||UNUSED||0|1629130209
kTCCServiceScreenCapture|com.apple.screensharing.agent|0|1|1||||UNUSED||0|1613524400
will allow chrome/chromium tests to run in CI. :dividehex will work with QTS to get this rolled out to our pool of R8's. I don't know the timeline, but I would expect this to take a week or two.
Assignee | ||
Comment 28•3 years ago
|
||
I'm picking this up since Jake will be out. I read through the backlog, and I understand the needs and where we're at.
I agree disabling SIP will be the cleanest solution from the possible hacks I can imagine here*, and I'd like to follow-up, after we get it working and into production, with testing turning SIP back on after we've enabled the screen capture. That might preserve the safety of SIP, and simulate the typical user environment.
I'll ask QTS to disable SIP in steps across the R8 pool.
- (scripted vnc would be complicated, and a long workaround with full disk access could be brittle at future upgrades)
Assignee | ||
Comment 29•3 years ago
|
||
I'm asking QTS first to disable SIP on macmini-r8-{120..127} and I'm putting these into the r8-staging pool. Then we could run some tests to make sure it works (and how we're applying the change and get that into the configuration management bootstrap or puppet).
Assignee | ||
Comment 30•3 years ago
|
||
QTS disabled SIP on the r8-staging pool, macmini-r8-{120..127}: https://firefox-ci-tc.services.mozilla.com/provisioners/releng-hardware/worker-types/gecko-t-osx-1015-r8-staging
I applied the direct tcc.db insert and verified the current state (macmini-r8-126 was the previously manually tested loaner and also has access granted to Terminal):
sudo sqlite3 "/Library/Application Support/com.apple.TCC/TCC.db" "insert into access (service, client, client_type, allowed, prompt_count, indirect_object_identifier, flags) values (\"kTCCServiceScreenCapture\", \"/bin/bash\", 1, 1, 1, \"UNUSED\", 0)"
[...]
$ hn_tail=".test.releng.mdc1.mozilla.com"; for I in {120..127}; do hn=macmini-r8-$I; ( ssh -o LogLevel=ERROR -o StrictHostKeyChecking=no -o ConnectTimeout=120 -o UserKnownHostsFile=/dev/null ${hn}${hn_tail} 'printf "%s " $(hostname) "$(csrutil status)" "$(sudo grep workerType\" /opt/worker/generic-worker.conf.yaml)"; echo; sudo sqlite3 "/Library/Application Support/com.apple.TCC/TCC.db" "select * from access where service=\"kTCCServiceScreenCapture\"" '); done
macmini-r8-120.test.releng.mdc1.mozilla.com System Integrity Protection status: disabled. "workerType": "gecko-t-osx-1015-r8-staging",
kTCCServiceScreenCapture|/bin/bash|1|1|1||||UNUSED||0|1629935852
kTCCServiceScreenCapture|com.apple.screensharing.agent|0|1|1||||UNUSED||0|1613096512
macmini-r8-121.test.releng.mdc1.mozilla.com System Integrity Protection status: disabled. "workerType": "gecko-t-osx-1015-r8-staging",
kTCCServiceScreenCapture|/bin/bash|1|1|1||||UNUSED||0|1629935886
kTCCServiceScreenCapture|com.apple.screensharing.agent|0|1|1||||UNUSED||0|1613097599
macmini-r8-122.test.releng.mdc1.mozilla.com System Integrity Protection status: disabled. "workerType": "gecko-t-osx-1015-r8-staging",
kTCCServiceScreenCapture|/bin/bash|1|1|1||||UNUSED||0|1629935886
kTCCServiceScreenCapture|com.apple.screensharing.agent|0|1|1||||UNUSED||0|1613094865
macmini-r8-123.test.releng.mdc1.mozilla.com System Integrity Protection status: disabled. "workerType": "gecko-t-osx-1015-r8-staging",
kTCCServiceScreenCapture|/bin/bash|1|1|1||||UNUSED||0|1629935886
kTCCServiceScreenCapture|com.apple.screensharing.agent|0|1|1||||UNUSED||0|1613150494
macmini-r8-124.test.releng.mdc1.mozilla.com System Integrity Protection status: disabled. "workerType": "gecko-t-osx-1015-r8-staging",
kTCCServiceScreenCapture|/bin/bash|1|1|1||||UNUSED||0|1629935886
kTCCServiceScreenCapture|com.apple.screensharing.agent|0|1|1||||UNUSED||0|1613095646
macmini-r8-125.test.releng.mdc1.mozilla.com System Integrity Protection status: disabled. "workerType": "gecko-t-osx-1015-r8-staging",
kTCCServiceScreenCapture|/bin/bash|1|1|1||||UNUSED||0|1629935886
kTCCServiceScreenCapture|com.apple.screensharing.agent|0|1|1||||UNUSED||0|1613095642
macmini-r8-126.test.releng.mdc1.mozilla.com System Integrity Protection status: disabled. "workerType": "gecko-t-osx-1015-r8-staging",
kTCCServiceScreenCapture|/bin/bash|1|1|1|úÞ
|||UNUSED||0|1629156482
kTCCServiceScreenCapture|com.apple.Terminal|0|1|1|úÞ
|||UNUSED||0|1629130209
kTCCServiceScreenCapture|com.apple.screensharing.agent|0|1|1||||UNUSED||0|1613524400
macmini-r8-127.test.releng.mdc1.mozilla.com System Integrity Protection status: disabled. "workerType": "gecko-t-osx-1015-r8-staging",
kTCCServiceScreenCapture|/bin/bash|1|1|1||||UNUSED||0|1629935892
kTCCServiceScreenCapture|com.apple.screensharing.agent|0|1|1||||UNUSED||0|1613095648
Assignee | ||
Comment 31•3 years ago
|
||
:jmaher could you run the chromedriver tests against the releng-hardware/gecko-t-osx-1015-r8-staging pool to verify the tcc.db change? If it works, then I'll ask QTS to disable SIP across half of the macmini r8 production pool (and we can give it a few days to make sure it is sane for everything else).
Reporter | ||
Comment 32•3 years ago
|
||
this is failing for me, but looking at your query from comment 30, I don't see com.apple.Terminal except on macmini-r8-126. Is it possible to add that in to these machines and I can try again?
Assignee | ||
Comment 33•3 years ago
|
||
:jmaher did you test through a try push or through vnc? I think terminal is only required when running it from terminal (from the gui/vnc). If it is running from terminal in the task, then I'll add terminal also.
Reporter | ||
Comment 34•3 years ago
|
||
I did the tests via try server- quite likely terminal is only needed when in with VNC. But as it stands these tests are not running successfully whereas they did on try (~2 weeks ago - code could have changed, had the terminal for screencapture permissions, maybe other adjustments to allow for vnc/loaner/etc.?)
Assignee | ||
Comment 35•3 years ago
|
||
Thanks! I'll compare with the tcc.db contents in the loaner to make sure I'm not missing something there. And I may try testing through vnc on one of these and the loaner to reproduce it (and to see if it works through vnc on these new ones but not in a task)
Assignee | ||
Comment 36•3 years ago
|
||
I copied the full /opt/worker/tasks/jmaher tree over to macmini-r8-125 and ran it there (and I re-ran it on macmini-r8-126, the loaner) and the results show ScreenCapture blocked for sshd (since I'd ssh'd in).
I'll set the tcc.db to allow from ssh and Terminal and retry.
loaner (!26):
2021-08-26 18:01:16.284575+0000 0x13c0c Error 0xf4c5 162 0 tccd: [com.apple.TCC:access] notifyUserOfDeniedAccessBy: for /usr/libexec/sshd-keygen-wrapper fails when requestor has UID 0
2021-08-26 18:01:16.284577+0000 0x13c0c Default 0xf4c5 162 0 tccd: [com.apple.TCC:access] Service kTCCServiceScreenCapture does not allow prompting; returning denied.
non-loaner (125):
2021-08-26 17:48:38.276112+0000 0x2e386 Error 0x1fbb5 160 0 tccd: [com.apple.TCC:access] notifyUserOfDeniedAccessBy: for /usr/libexec/sshd-keygen-wrapper fails when requestor has UID 0
2021-08-26 17:48:38.276113+0000 0x2e386 Default 0x1fbb5 160 0 tccd: [com.apple.TCC:access] Service kTCCServiceScreenCapture does not allow prompting; returning denied.
Assignee | ||
Comment 37•3 years ago
|
||
It worked for me when manually running from ssh (once I had added ssh as allowed for screen capture).
https://treeherder.mozilla.org/jobs?repo=try&revision=aae6d4dbf64ba32e814f5ab11845163c12976e53 I think this is the last push you'd had for chromium/browsertime on the r8's, and it went to the production pool. Maybe I'm missing the tasks that went to -staging. I'll try re-running on the r8-staging pool.
Reporter | ||
Comment 38•3 years ago
|
||
I have had a lot of try pushes, here is when it was working against staging (Aug 17th):
https://treeherder.mozilla.org/jobs?repo=try&selectedTaskRun=af3fSHY6QfCH0a4_B_0qeA.0&tier=1%2C2%2C3&searchStr=osx%2Cchrom&revision=378c736146a2798bd500fc5c4fc49da94d542a2c
Assignee | ||
Comment 39•3 years ago
|
||
Could you try against r8-staging again?
My re-runs of task, d6FB7W9dQXiRiPxzU2IZLQ (https://treeherder.mozilla.org/jobs?repo=try&revision=aae6d4dbf64ba32e814f5ab11845163c12976e53&selectedTaskRun=d6FB7W9dQXiRiPxzU2IZLQ.0), complete (like https://firefox-ci-tc.services.mozilla.com/tasks/HWhQ3bXlQ4SOmOrmMPWbWg/runs/0) on the -staging pool.
So I think it will work without adding access for Terminal (or ssh).
Reporter | ||
Comment 40•3 years ago
|
||
got it right this time, looking good:
https://treeherder.mozilla.org/jobs?repo=try&tier=1%2C2%2C3&revision=f0e3ca54fc6e5afff9b3a46e4f207e3daad57776
Assignee | ||
Comment 41•3 years ago
|
||
Great! Then I'll ask QTS to disable SIP on a block of the R8 production pool (and I'll apply the tcc.db change to those when they're done). And I'll put the block back into the pool and we can review in a few days to may sure they're okay (for other tests).
Assignee | ||
Comment 42•3 years ago
|
||
since there isn't a rush on this, I'll ask QTS to disable SIP later this afternoon -- I got auto-banned again yesterday (and it took me a few hours to realize it combined with my cable outage. I really need to sort the auto-ban out with security), and since today is quiet/well I'll wait to ping anyone (the ban should clear this afternoon).
Assignee | ||
Comment 43•3 years ago
|
||
I asked QTS to disable SIP on the rest of the 72 r8 macminis in the IT41 rack (where the test set are). So that will apply to macmini-r8-{73..144}. That leaves macmini-r8-{{1..72},{145..288}} (about 208 to go, skipping builders).
When they're done, I'll apply the tcc.db insert for bash to screencapture and then make sure they are taking production tasks. If there are no problems, next week I'll step through the rest of the pool with qts.
Assignee | ||
Comment 44•3 years ago
|
||
QTS disabled SIP on the range of production mac workers, macmini-r8-{73..144}. I confirmed it is disabled this morning. It looks like the workers in this range are completing tasks (green), https://firefox-ci-tc.services.mozilla.com/provisioners/releng-hardware/worker-types/gecko-t-osx-1015-r8.
If there are no surprise failures by Monday, then I'll ask QTS to disable SIP on another batch.
Assignee | ||
Comment 45•3 years ago
|
||
I asked QTS to disable SIP on macmini-r8-{{145..194},{201..216}} (top half of racks IT42,IT44).
Once set on those, we'll be at about half.
Assignee | ||
Comment 46•3 years ago
|
||
I've asked QTS to disable SIP on the remainder (2 racks, 144) of the r8 mac minis: macmini-r8-{{1..72},{217..288}}
Assignee | ||
Comment 47•3 years ago
|
||
QTS was faster than expected (much faster than the sla) on the other requests, but hasn't disabled sip on this last batch yet. I checked over all of the r8's, and these last two racks still have sip enabled. There are a few scattered ones that also have sip enabled (maybe the disable was incorrect or these flipped back to enabled after a crash or something. I will get them re-disabled and the tcc.db change applied).
I want to set up monitoring of the SIP status and tcc.db contents. -- or rather I should monitor for tcc blocks in the logs (will work with sip enabled, and could show new problems).
Assignee | ||
Comment 48•3 years ago
|
||
QTS finished disabling SIP across all of the R8's yesterday morning. I confirmed all but 3:
macmini-r8-78.test.releng.mdc1.mozilla.com System Integrity Protection status: enabled. "workerType": "gecko-t-osx-1015-r8",
macmini-r8-75.test.releng.mdc1.mozilla.com System Integrity Protection status: enabled. "workerType": "gecko-t-osx-1015-r8",
macmini-r8-273.test.releng.mdc1.mozilla.com System Integrity Protection status: enabled. "workerType": "gecko-t-osx-1015-r8",
I'll ask them to re-disable these and I'll apply the tcc.db change across the entire pool.
Assignee | ||
Comment 49•3 years ago
|
||
I applied the tcc.db insert across the full pool after verifying they all have sip disabled now.
Reporter | ||
Comment 50•3 years ago
|
||
Comment 51•3 years ago
|
||
Pushed by jmaher@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/37656bdc24fc enable osx 10.15 tp6 tests on chrome/chromium. r=perftest-reviewers,AlexandruIonescu
Comment 52•3 years ago
|
||
bugherder |
Updated•3 years ago
|
Description
•