Closed Bug 1506888 Opened 3 years ago Closed 2 years ago

Intermittent ..\python\mach\mach\test\test_telemetry.py::test_path_filtering_other_cwd TEST-UNEXPECTED-FAIL

Categories

(Firefox Build System :: Mach Core, enhancement, P5)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: intermittent-bug-filer, Assigned: ted)

References

Details

(Keywords: in-triage, intermittent-failure, Whiteboard: [stockwell unknown])

Filed by: archaeopteryx [at] coole-files.de

https://treeherder.mozilla.org/logviewer.html#?job_id=211034689&repo=autoland

https://queue.taskcluster.net/v1/task/Sba7H8yoQMergfKz7Xtaxg/runs/0/artifacts/public/logs/live_backing.log

23:00:18     INFO - platform win32 -- Python 2.7.15, pytest-3.6.2, py-1.5.4, pluggy-0.6.0 -- z:\build\build\src\obj-firefox\_virtualenvs\obj-firefox-sNC2NmI7-2.7\Scripts\python.exe
23:00:18     INFO - rootdir: z:\build\build\src, inifile: z:\build\build\src\config\mozunit\mozunit\pytest.ini
23:00:18     INFO - collecting ... collected 6 items
23:00:18     INFO - ..\python\mach\mach\test\test_telemetry.py::test_simple PASSED
23:00:18     INFO - ..\python\mach\mach\test\test_telemetry.py::test_path_filtering PASSED
23:00:18     INFO - ..\python\mach\mach\test\test_telemetry.py::test_path_filtering_in_objdir PASSED
23:00:18  WARNING - ..\python\mach\mach\test\test_telemetry.py::test_path_filtering_other_cwd TEST-UNEXPECTED-FAIL
23:00:18     INFO - ..\python\mach\mach\test\test_telemetry.py::test_mach_invoke_recursive PASSED
23:00:18     INFO - ..\python\mach\mach\test\test_telemetry.py::test_registrar_dispatch PASSED
23:00:18     INFO - ================================== FAILURES ===================================
23:00:18     INFO - ________________________ test_path_filtering_other_cwd ________________________
23:00:18     INFO - run_mach = <function run at 0x000000000492F588>
23:00:18     INFO - tmpdir = local('c:\\users\\task_1541886520\\appdata\\local\\temp\\pytest-of-task_1541886520\\pytest-1\\test_path_filtering_other_cwd0')
23:00:18     INFO -     def test_path_filtering_other_cwd(run_mach, tmpdir):
23:00:18     INFO -         srcdir_path = os.path.join(buildconfig.topsrcdir, 'a')
23:00:18     INFO -         srcdir_path_2 = os.path.join(buildconfig.topsrcdir, 'a/b/c')
23:00:18     INFO -         other_path = str(tmpdir.join('other'))
23:00:18     INFO -         data = run_mach('python', '-c', 'pass',
23:00:18     INFO -                         srcdir_path,
23:00:18     INFO -                         srcdir_path_2,
23:00:18     INFO -                         other_path, cwd=str(tmpdir))
23:00:18     INFO - >       assert len(data) == 1
23:00:18     INFO - E       assert 0 == 1
23:00:18     INFO - E        +  where 0 = len([])
23:00:18     INFO - ..\python\mach\mach\test\test_telemetry.py:114: AssertionError
See Also: → 1506759
Duplicate of this bug: 1507264
Duplicate of this bug: 1507268
Keywords: in-triage
I added some more logging for failures and have a try push where I'm trying to reproduce:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=31aa5371a86d89a17a1f6ab092f99859193&selectedJob=212886621
Ted, this is the most recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=215107945&repo=autoland&lineNumber=48664

The job was on Windows 2012 x64 asan build-win64-asan/opt (Bo), i did not see this on your try push. Hope it helps. 


23:08:09     INFO - z:\build\build\src\python\mach\mach\test\test_telemetry.py
23:08:09     INFO - ============================= test session starts =============================
23:08:09     INFO - platform win32 -- Python 2.7.15, pytest-3.6.2, py-1.5.4, pluggy-0.6.0 -- z:\build\build\src\obj-firefox\_virtualenvs\obj-firefox-sNC2NmI7-2.7\Scripts\python.exe
23:08:09     INFO - rootdir: z:\build\build\src, inifile: z:\build\build\src\config\mozunit\mozunit\pytest.ini
23:08:09     INFO - collecting ... collected 6 items
23:08:09     INFO - ..\python\mach\mach\test\test_telemetry.py::test_simple PASSED
23:08:09     INFO - ..\python\mach\mach\test\test_telemetry.py::test_path_filtering PASSED
23:08:09  WARNING - ..\python\mach\mach\test\test_telemetry.py::test_path_filtering_in_objdir TEST-UNEXPECTED-FAIL
23:08:09     INFO - ..\python\mach\mach\test\test_telemetry.py::test_path_filtering_other_cwd PASSED
23:08:09     INFO - ..\python\mach\mach\test\test_telemetry.py::test_mach_invoke_recursive PASSED
23:08:09     INFO - ..\python\mach\mach\test\test_telemetry.py::test_registrar_dispatch PASSED
23:08:09     INFO - ================================== FAILURES ===================================
23:08:09     INFO - ________________________ test_path_filtering_in_objdir ________________________
23:08:09     INFO - run_mach = <function run at 0x00000000044AED68>
23:08:09     INFO - tmpdir = local('c:\\users\\task_1543700899\\appdata\\local\\temp\\pytest-of-task_1543700899\\pytest-1\\test_path_filtering_in_objdir0')
23:08:09     INFO -     def test_path_filtering_in_objdir(run_mach, tmpdir):
23:08:09     INFO -         srcdir_path = os.path.join(buildconfig.topsrcdir, 'a')
23:08:09     INFO -         srcdir_path_2 = os.path.join(buildconfig.topsrcdir, 'a/b/c')
23:08:09     INFO -         objdir_path = os.path.join(buildconfig.topobjdir, 'x')
23:08:09     INFO -         objdir_path_2 = os.path.join(buildconfig.topobjdir, 'x/y/z')
23:08:09     INFO -         other_path = str(tmpdir.join('other'))
23:08:09     INFO -         data = run_mach('python', '-c', 'pass',
23:08:09     INFO -                         srcdir_path, srcdir_path_2,
23:08:09     INFO -                         objdir_path, objdir_path_2,
23:08:09     INFO -                         other_path,
23:08:09     INFO - >                       cwd=buildconfig.topobjdir)
23:08:09     INFO - ..\python\mach\mach\test\test_telemetry.py:91:
23:08:09     INFO - _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
23:08:09     INFO - args = ('python', '-c', 'pass', 'z:/build/build/src\a', 'z:/build/build/src\a/b/c', 'z:/build/build/src/obj-firefox\x', ...)
23:08:09     INFO - kwargs = {'cwd': 'z:/build/build/src/obj-firefox'}
23:08:09     INFO - path = 'c:\users\task_1543700899\appdata\local\temp\pytest-of-task_1543700899\pytest-1\test_path_filtering_in_objdir0\telemetry\outgoing'
23:08:09     INFO -     def run(*args, **kwargs):
23:08:09     INFO -         # Run mach with the provided arguments
23:08:09     INFO -         subprocess.check_output([sys.executable, mach] + list(args),
23:08:09     INFO -                                 stderr=subprocess.STDOUT,
23:08:09     INFO -                                 env=env,
23:08:09     INFO -                                 **kwargs)
23:08:09     INFO -         # Load any telemetry data that was written
23:08:09     INFO -         path = unicode(tmpdir.join('telemetry', 'outgoing'))
23:08:09     INFO - >       return [json.load(open(os.path.join(path, f), 'rb')) for f in os.listdir(path)]
23:08:09     INFO - E       WindowsError: [Error 3] The system cannot find the path specified: u'c:\\users\\task_1543700899\\appdata\\local\\temp\\pytest-of-task_1543700899\\pytest-1\\test_path_filtering_in_objdir0\\telemetry\\outgoing\\*.*'
23:08:09     INFO - ..\python\mach\mach\test\test_telemetry.py:41: WindowsError
23:08:09     INFO - ===================== 1 failed, 5 passed in 94.58 seconds =====================
23:08:09     INFO - Setting retcode to 1 from z:\build\build\src\python\mach\mach\test\test_telemetry.py
23:08:09     INFO - z:\build\build\src\python\mozbuild\dumbmake\test\test_dumbmake.py
Whiteboard: [stockwell needswork:owner]
I triggered *so many* Windows build jobs on try trying to reproduce this failure with extra logging but was not successful:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=5a180646f096ae629a96af0f4c2407e6321

I think I'll just get my extra logging patch reviewed and landed and hopefully we can get more information out of the failures that are occurring.
Depends on: 1517244
I looked at OrangeFactor to find some failures since bug 1517244 landed and I did find a couple:

https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=93bc74cb72cfcf00b16eea08baa9187c3375e74d&selectedJob=219742415
https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=76a97803554214cda9eab89b5ef0a4b33380cc20&selectedJob=219766558

The logs both show the same thing:
10:43:49     INFO - Path does not exist: "c:\users\task_1546506674\appdata\local\temp\pytest-of-task_1546506674\pytest-1\test_path_filtering_other_cwd0\telemetry\outgoing"
10:43:49     INFO - Path does not exist: "c:\users\task_1546506674\appdata\local\temp\pytest-of-task_1546506674\pytest-1\test_path_filtering_other_cwd0\telemetry"

So the temp directory exists, but neither the telemetry nor telemetry/outgoing directories exist underneath it, which indicates that telemetry isn't being written for some reason!
Reading through the telemetry collection/submission code it strikes me that we could hit this scenario where we somehow produce data that doesn't validate against the schema, in which case `gather_telemetry` will return None:
https://searchfox.org/mozilla-central/rev/ecf61f8f3904549f5d65a8a511dbd7ea4fd1a51d/python/mozbuild/mozbuild/telemetry.py#277-282

If that happens `post_dispatch_handler` will simply not write telemetry, which would produce the symptoms of this failure:
https://searchfox.org/mozilla-central/rev/ecf61f8f3904549f5d65a8a511dbd7ea4fd1a51d/build/mach_bootstrap.py#266

The tests are running mach via `subprocess.check_output`, but we throw the output away, so any error messages produced by the above code won't make it into the logs:
https://searchfox.org/mozilla-central/rev/ecf61f8f3904549f5d65a8a511dbd7ea4fd1a51d/python/mach/mach/test/test_telemetry.py#35-38

I guess we should save the output and log it if something fails?
Depends on: 1517712

I just looked at a few logs to see what they look like after landing bug 1517712, and they all contain the same error:

https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=221046852&repo=mozilla-central&lineNumber=98837

11:39:20     INFO - Error loading telemetry. mach output:
11:39:20     INFO - =========================================================
11:39:20     INFO - Build telemetry is invalid:
11:39:20     INFO - value does not match expected format %Y-%m-%dT%H:%M:%S.%fZ for dictionary value @ data[u'time']
11:39:20     INFO - {u'argv': [u'-c', u'pass', u'$topsrcdir/a', u'$topsrcdir/a/b/c', u'other'],
11:39:20     INFO -  u'build_opts': {u'artifact': False,
11:39:20     INFO -                  u'ccache': False,
11:39:20     INFO -                  u'compiler': 'clang-cl',
11:39:20     INFO -                  u'debug': False,
11:39:20     INFO -                  u'opt': True,
11:39:20     INFO -                  u'sccache': False},
11:39:20     INFO -  u'client_id': '306a548d-a045-4ba6-8307-1ac421c94e51',
11:39:20     INFO -  u'command': u'python',
11:39:20     INFO -  u'duration_ms': 437,
11:39:20     INFO -  u'success': True,
11:39:20     INFO -  u'system': {u'cpu_brand': u'Intel(R) Xeon(R) CPU E5-2666 v3 @ 2.90GHz',
11:39:20     INFO -              u'logical_cores': 16,
11:39:20     INFO -              u'memory_gb': 30,
11:39:20     INFO -              u'os': u'windows',
11:39:20     INFO -              u'physical_cores': 8},
11:39:20     INFO -  u'time': u'2019-01-10T11:38:20Z'}
11:39:20     INFO - =========================================================

So apparently these failures are bug 1517909, which conveniently I wrote a patch for yesterday! I just landed it on autoland, so I'd expect these failures to stop as that gets merged around.

Depends on: 1517909

I was curious as to why this failure only showed up on Windows so I looked at the Python source for time.time() and it uses this floattime function:
https://github.com/python/cpython/blob/c0a1d73c64cbc3753b6e16f78d2c77a0389fff21/Modules/timemodule.c#L995

Windows Python doesn't have gettimeofday so it uses ftime:
https://github.com/python/cpython/blob/87667c54c6650751c5d7bf7b9e465c8c4af45f71/PC/pyconfig.h#L495-L508

...and as the comment in floattime says, gettimeofday has microsecond resolution, but ftime only has millisecond resolution, so it's just way more likely that you'll get a value with no fractional component on Windows vs. other platforms.

Assignee: nobody → ted
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.