Closed Bug 1522821 Opened 5 years ago Closed 2 years ago

Frequent A promise chain failed to handle a rejection: Win error 2 during operation DirectoryIterator.prototype.next on file C:\Users\task_1548419934\AppData\Local\Temp\tmpsb0lnx.mozrunner\thumbnails (The system cannot find the file specified.

Categories

(Infrastructure & Operations :: RelOps: Windows OS, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jmaher, Unassigned)

References

Details

(Keywords: intermittent-failure)

we are working to upgrade windows 10 to the 1803 update (yes, a year old, baby steps). In doing this we find a few tests that don't work. Here we ran each job 3 times and bc6 failed 2/3 times:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=3b2654dcf6ffb8d7c5db26c84153085a133a6354&searchStr=bc6

What we see in the log is:
13:02:10 INFO - TEST-PASS | browser/components/preferences/in-content/tests/browser_sync_sanitize.js | should have sanitized the browser - 1 == 1 -
13:02:10 INFO - TEST-PASS | browser/components/preferences/in-content/tests/browser_sync_sanitize.js | should not have sanitized Sync - 0 == 0 -
13:02:10 INFO - TEST-PASS | browser/components/preferences/in-content/tests/browser_sync_sanitize.js | should have signed out of FxA - 1 == 1 -
13:02:10 INFO - TEST-PASS | browser/components/preferences/in-content/tests/browser_sync_sanitize.js | should not have cleared the last-fxa-user pref as we only sanitized non-sync - true == true -
13:02:10 INFO - Buffered messages finished
13:02:10 INFO - TEST-UNEXPECTED-FAIL | browser/components/preferences/in-content/tests/browser_sync_sanitize.js | A promise chain failed to handle a rejection: Win error 2 during operation DirectoryIterator.prototype.next on file C:\Users\task_1548419934\AppData\Local\Temp\tmpsb0lnx.mozrunner\thumbnails (The system cannot find the file specified.
13:02:10 INFO - ) - stack: (No stack available.)
13:02:10 INFO - Rejection date: Fri Jan 25 2019 13:02:10 GMT+0000 (Greenwich Mean Time) - false == true - JS frame :: resource://testing-common/PromiseTestUtils.jsm :: assertNoUncaughtRejections :: line 257
13:02:10 INFO - Stack trace:
13:02:10 INFO - resource://testing-common/PromiseTestUtils.jsm:assertNoUncaughtRejections:257
13:02:10 INFO - chrome://mochikit/content/browser-test.js:Tester_execTest/<:1133
13:02:10 INFO - chrome://mochikit/content/browser-test.js:Tester_execTest:1099
13:02:10 INFO - chrome://mochikit/content/browser-test.js:nextTest/<:997
13:02:10 INFO - chrome://mochikit/content/tests/SimpleTest/SimpleTest.js:SimpleTest.waitForFocus/waitForFocusInner/focusedOrLoaded/<:803
13:02:10 INFO - Leaving test bound testSanitizeBrowser
13:02:10 INFO - Entering test bound testDisconnectAlreadyRunning

it appears we are expecting a thumbnail directory to exist, but one doesn't.

:markh, can you look at this? ok to disable on windows?

Flags: needinfo?(markh)
Blocks: 1522896

I don't think this error is directly related to the test. What I suspect is happening is that as part of "sanitizing" browser data (via Sanitizer.jsm), we may be removing some captured thumbnails - and it's probably the thumbnail code itself, and possibly PageThumbsWorker.js, which is trying to iterate over a (now removed) thumbnail directory. However, the lack of a stack in the failing promise chain makes this difficult to know for sure. So I suspect there is some timing issue when the test is run in this environment.

I've no objection to disabling this test on Windows, but I suspect that will just be masking a real issue in the thumbnail code - although I suspect that "real issue" is probably benign and just causes console spam.

(FWIW, running that test stand alone on Windows 10 works fine. If you can suggest a way I could reproduce it locally I'd be happy to try and dig into what thumbnail related code is involved here)

Flags: needinfo?(markh)

thanks :markh, hopefully this is something we can get fixed, or maybe we find that 1903 (due to come out soon) it is fixed :)

This hits more tests, converting it to a more general bug.

Component: Sync → New Tab Page
Summary: browser_sync_sanitize.js is failing frequently on upgraded windows 10 → Frequent A promise chain failed to handle a rejection: Win error 2 during operation DirectoryIterator.prototype.next on file C:\Users\task_1548419934\AppData\Local\Temp\tmpsb0lnx.mozrunner\thumbnails (The system cannot find the file specified.

There are also some other failure about missing files:

https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=227599120&repo=mozilla-central&lineNumber=879

12:59:53 INFO - Running post-action listener: _package_coverage_data
12:59:53 ERROR - Exception during post-action for run-tests: Traceback (most recent call last):
12:59:53 ERROR - File "Z:\task_1549888690\mozharness\mozharness\base\script.py", line 2040, in run_action
12:59:53 ERROR - method(action, success=success and self.return_code == 0)
12:59:53 ERROR - File "Z:\task_1549888690\mozharness\mozharness\mozilla\testing\codecoverage.py", line 469, in package_coverage_data
12:59:53 ERROR - self.jsvm_dir)
12:59:53 ERROR - File "Z:\task_1549888690\mozharness\mozharness\mozilla\testing\codecoverage.py", line 289, in parse_coverage_artifacts
12:59:53 ERROR - jsvm_files = [os.path.join(jsvm_dir, e) for e in os.listdir(jsvm_dir)]
12:59:53 ERROR - WindowsError: [Error 3] The system cannot find the path specified: 'c:\users\task_1549888690\appdata\local\temp\tmp8m0lu
\.'
12:59:53 INFO - Running post-action listener: _resource_record_post_action

https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=227619208&repo=mozilla-release&lineNumber=7759

grenade, any idea what's going on? jmaher, have you seen this in the Try runs?

Flags: needinfo?(rthijssen)
Flags: needinfo?(jmaher)

https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=227637722&repo=mozilla-esr60&lineNumber=13736

15:00:22 INFO - Couldn't open file: Z:\tas4989k_5337\buil1549895337\build\application\firefox\xul.dll
15:00:22 INFO - Exception in thread ProcessReader:
15:00:22 INFO - Traceback (most recent call last):
15:00:22 INFO - File "c:\mozilla-build\python\Lib\threading.py", line 801, in __bootstrap_inner
15:00:22 INFO - self.run()
15:00:22 INFO - File "c:\mozilla-build\python\Lib\threading.py", line 754, in run
15:00:22 INFO - self.__target(*self.__args, **self.__kwargs)
15:00:22 INFO - File "Z:\task_1549895337\build\venv\lib\site-packages\mozprocess\processhandler.py", line 1028, in _read
15:00:22 INFO - callback(line.rstrip())
15:00:22 INFO - File "Z:\task_1549895337\build\venv\lib\site-packages\mozprocess\processhandler.py", line 944, in call
15:00:22 INFO - e(*args, **kwargs)
15:00:22 INFO - File "Z:\task_1549895337\build\tests\mochitest\runtests.py", line 2931, in processOutputLine
15:00:22 INFO - message = handler(message)
15:00:22 INFO - File "Z:\task_1549895337\build\tests\mochitest\runtests.py", line 3018, in fix_stack
15:00:22 INFO - message['data'] = self.stackFixerFunction(message['data'])
15:00:22 INFO - File "Z:\task_1549895337\build\venv\lib\site-packages\mozrunner\utils.py", line 268, in stack_fixer_function
15:00:22 INFO - return stack_fixer_module.fixSymbols(line, symbolsPath)
15:00:22 INFO - File "Z:\task_1549895337\build\tests\bin\fix_stack_using_bpsyms.py", line 153, in fixSymbols
15:00:22 INFO - symbol = addressToSymbol(file, address, symbolsDir)
15:00:22 INFO - File "Z:\task_1549895337\build\tests\bin\fix_stack_using_bpsyms.py", line 139, in addressToSymbol
15:00:22 INFO - p = getSymbolFile(file, symbolsDir)
15:00:22 INFO - File "Z:\task_1549895337\build\tests\bin\fix_stack_using_bpsyms.py", line 128, in getSymbolFile
15:00:22 INFO - symfile = guessSymbolFile(file, symbolsDir)
15:00:22 INFO - File "Z:\task_1549895337\build\tests\bin\fix_stack_using_bpsyms.py", line 119, in guessSymbolFile
15:00:22 INFO - uuid = findIdForPath(full_path)
15:00:22 INFO - File "Z:\task_1549895337\build\tests\bin\fix_stack_using_bpsyms.py", line 103, in findIdForPath
15:00:22 INFO - (path, e.output))
15:00:22 INFO - Exception: Error getting fileid for Z:\tas4989k_5337\buil1549895337\build\application\firefox\xul.dll:
15:00:29 INFO - TEST-INFO | Main app process: exit 0
15:00:29 ERROR - 804 ERROR TEST-UNEXPECTED-FAIL | Shutd

i suspect the problem is related to different routines using different mechanisms to find the users profile or temp directory.

i note that the path of the file that is not found (aside from the one mentioned in comment 8, which may be a different issue) is generally below `C:\Users\task_xxx\AppData\Local\Temp.

the problem which held us back from the 1803 update for a long time was that windows 1803 does not allow generic worker to run SHSetKnownFolderPath for the user profile directory (see bug 1493688, comment 21)

the difference between 1703 and 1803 on our infra (aside from the os upgrade) is that SHSetKnownFolderPath has not been run successfully for the user profile.

my guess is that the file exists under Z:\task_xxx\AppData\Local\Temp because whatever put it there uses a different algorithm to determine the path to the user profile than the process that later attempts to find the file.

Flags: needinfo?(rthijssen)

I do not recall seeing the in the dozens of try pushes; this would have been a more frequent if not persistent error and what I was seeing was specific test failures that didn't look like stack traces or file access.

Flags: needinfo?(jmaher)

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #10)

I do not recall seeing the in the dozens of try pushes; this would have been a more frequent if not persistent error and what I was seeing was specific test failures that didn't look like stack traces or file access.

yes, i'm kinda stumped by that too. i don't have an explanation.

if the processes which are failing to find the user profile are using SHGetKnownFolderPath to determine the path to the profile directory, it might be that we need to do something hacky to make the tests work. a temporary solution might be to have the test setup run something like this (create a symlink from the task directory to the user directory AppData folder):

cmd /c mklink %cd%\AppData C:\Users\%USERNAME%\AppData

i say temporary because it would be better if we could fix whatever is preventing generic-worker from running SHSetKnownFolderPath

Has only failed on our own hardware (moonshots in the past, now Yoga notebooks).

Component: New Tab Page → RelOps: Windows OS
Product: Firefox → Infrastructure & Operations

backlog cleanup. re-open if this pops up again.

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.