Closed Bug 1373551 Opened 7 years ago Closed 5 years ago

Intermittent "How do you want to open this file?" dialog on Windows 10 test machines

Categories

(Infrastructure & Operations :: RelOps: Windows OS, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: johannh, Assigned: grenade)

References

Details

We recently enabled screenshot comparison testing on Windows 10, but unfortunately there's this dialog intermittently showing up in the middle of our screenshots:

https://public-artifacts.taskcluster.net/e64PfNZtTK-Btd5LkkAGhg/0/public/test_info/20170614013944-permissionPrompts_04_noLWT_shareScreen.png

This leads to a lot of falsely reported differences such as:

https://screenshots.mattn.ca/comparisons/mozilla-central/b266a8d8fd595b84a7d6218d7b8c6b7af0b5027c/mozilla-central/ad3f1138ce6f199408ad58d65c7476636e924909/windows10-64-vm/permissionPrompts_04_noLWT_shareScreen.png

You can find the whole gallery if you uncheck "known inconsistencies" on https://screenshots.mattn.ca/compare/?oldProject=mozilla-central&oldRev=b266a8d8fd595b84a7d6218d7b8c6b7af0b5027c&newProject=mozilla-central&newRev=ad3f1138ce6f199408ad58d65c7476636e924909&filter=windows10

This effectively prevents us from enabling automatic reporting on Windows 10.

Since it's intermittent (and mozscreenshots is not opening a .txt file to my knowledge), I suspect it's not the test's fault.

It would be interesting to know which file it's trying to open.

The screenshots are from this job: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=b266a8d8fd595b84a7d6218d7b8c6b7af0b5027c&filter-tier=1&filter-tier=2&filter-tier=3&filter-searchStr=ss&selectedJob=106884521
Assignee: relops → rthijssen
:ahal, could you help figure out what might be causing this?  Possibly this is something outside of the tests and has nothing to do, but I suspect this is related to tests running or something in the land of the firefox browser session
Flags: needinfo?(ahalberstadt)
Hm, this isn't ringing any bells and I can't think of anytime Firefox would randomly try to open a text file unsolicited. To be clear, the tests aren't trying to open anything? You could check mimeTypes.rdf in the profile for any suspicious handling of .txt.. Maybe it could be set to always save to disk or something.

Or maybe setting browser.download.loglevel=info might provide some useful info in the Gecko logs?
Flags: needinfo?(ahalberstadt)
I haven't spent a lot of time looking into this yet, my idea that this is not coming from our tests is because:

- There's no txt in mozscreenshots: https://searchfox.org/mozilla-central/search?q=txt&path=mozscreenshots
- The download icon isn't blue, which it would be if we downloaded anything in this session.
- Shouldn't it show the Firefox "Open With" dialog instead if it was opened with Firefox?
- As I mentioned, it's intermittent.
It's intermittent, and cross-suite (see also bug 1373567, I'd be astonished if a jsreftest tries to open a text file). Much more likely to be one of "something in the pre-run setup intermittently fails in a way that tries to open a text file" or "something that only runs on the first pre-run setup of a new instance always tries to open a text file."
Blocks: 1332945
FWIW I think this is also failing my browser_contextmenu_touch.js test https://treeherder.mozilla.org/#/jobs?repo=try&revision=5f78950e60fe0c0b6570a9be6a3f2178f863371d&selectedJob=113820104

Rob, can you let us know how this is progressing? Having a consistent Windows 10 test environment is getting more and more important for Photon development as we progress.
Flags: needinfo?(rthijssen)
apologies for the delay on sorting this. i'm hoping that explicitly setting notepad as the default text editor will prevent this dialogue from popping up. https://github.com/mozilla-releng/OpenCloudConfig/commit/efad455766c55d8166b984591adbf901b39f30ec
Flags: needinfo?(rthijssen)
Ah, cool, thanks! Can we expect this to trickle into the normal build architecture now (i.e. are you looking for feedback on whether this works in the mentioned test cases) or is that an additional step? :)
yes, i'm hoping to see that we don't get further occurrences of the popup mentioned in comment 0. the change in OpenCloudConfic is propagated to the win 10 testers
I want to make sure we're on the same page here... :johannh are you testing on buildbot hardware testers or are you using on taskcluster testers? The changes rob is making will only show up on taskcluster. Rob, can you please make sure this also gets ported to the w10 buildbot testers since that's what people are currently using for production talos tests?
Flags: needinfo?(jhofmann)
I'm on taskcluster AFAUI :)
Flags: needinfo?(jhofmann)
tried and failed to reproduce:
- created a loaner (https://tools.taskcluster.net/groups/Gwx3p81ET7SCGswa0fjD8g/tasks/Gwx3p81ET7SCGswa0fjD8g/details)
- rdp'd to loaner and created a new user account without admin privs, closed connection
- rdp'd to loaner as unprivileged user and ran following commands (derived from failed task in comment 11):
z:
mkdir task_grenade && cd task_grenade
wget https://queue.taskcluster.net/v1/task/SNsrvZ91SA21R-FN1-Td9A/runs/0/artifacts/public/build/mozharness.zip
7z x mozharness.zip
c:\mozilla-build\python\python.exe -u mozharness\scripts\desktop_unittest.py --cfg mozharness\configs\unittests\win_taskcluster_unittest.py --mochitest-suite=browser-chrome-screenshots --no-read-buildbot-config --installer-url https://queue.taskcluster.net/v1/task/SNsrvZ91SA21R-FN1-Td9A/artifacts/public/build/target.zip --test-packages-url https://queue.taskcluster.net/v1/task/SNsrvZ91SA21R-FN1-Td9A/artifacts/public/build/target.test_packages.json --download-symbols ondemand --mochitest-suite=browser-chrome-screenshots
- tests ran and failed but for other reasons. there were no screenshots containing the popup for txt file association and checking file associations by double clicking on an .txt file in explorer resulted in the file being opened by notepad.exe without a prompt.

not sure where to go at this point as the problem appears to be intermittent. if we knew what process was attempting to open a text file and what text file, that would be useful too. generic-worker on this worker type (gecko-t-win10-64) is an older version (8.3.0) which runs under the GenericWorker user account. so i'll also attempt to set file associations for that user (https://github.com/mozilla-releng/OpenCloudConfig/commit/fe1f4bb718d00c10e57574af02460b55943738f8). i guess we'll see in the coming days if the problem persists.
Blocks: 1326419
See Also: → 1382988
:grenade, can you look into this again, it seems as though we are still getting many instances of this (screenshots and marionette  historically, now in jsreftests)- what looks to be when we start up the browser.
Flags: needinfo?(rthijssen)
apologies for the delay, i will look into this again on monday
:grenade, how is this investigation coming along?  we continue to find more tests that are failing and suspect this is the root cause based on the screenshots.
Blocks: 1357082
still scratching my head on this one.

i created a task [1] on windows 10 which just runs the commands `assoc` and `ftype` which show us (assoc) what filetype an extension is associated to and (ftype) what program is configured to open each filetype.

the task log shows that:
- .txt extensions are correctly associated with file type txtfile (.txt=txtfile)
- txtfile file types are configured to be opened with notepad (txtfile="C:\Windows\System32\Notepad.exe" "%1").

the screenshot suggests that windows doesn't know what to open .txt files with, but the only mechanism i know of for setting or checking the file associations confirms that the file association is correctly set.

[1]: https://tools.taskcluster.net/groups/N9RKmbmqROmVWKSbU9GZ0Q/tasks/N9RKmbmqROmVWKSbU9GZ0Q/runs/0
Flags: needinfo?(rthijssen)
found a comment on the web suggesting that if windows detects a new program installed that has capability to open the filetype, it causes the popup. the registry setting below is supposed to prevent this

[HKEY_LOCAL_MACHINE\SOFTWARE\Policies\Microsoft\Windows\Explorer]
"NoNewAppAlert"=dword:00000001

testing now:
https://github.com/mozilla-releng/OpenCloudConfig/commit/9df98cb867b1e47da36c390c7dcb32b75d7603ee
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
(In reply to Rob Thijssen (:grenade - UTC+3) from comment #24)
> found a comment on the web suggesting that if windows detects a new program
> installed that has capability to open the filetype, it causes the popup. the
> registry setting below is supposed to prevent this

Interesting... but why is there a new program installed during testing?
(In reply to Xidorn Quan [:xidorn] UTC+10 from comment #26)
> Interesting... but why is there a new program installed during testing?

there isn't. it happens during the instance setup. it's just recognising a change from default (notepad only) setup at any point. not specifically during the test run.
pmoore: some of the failures being associated with this bug actually look like this:
https://public-artifacts.taskcluster.net/RajWsgBMRuab7kL4wxTGXw/0/public/test_info/mozilla-test-fail-screenshot_dc4jdj.png

as you can see, it's a different dialog to the "how do you want to open this file" popup. the screen obscuring the browser test is actually the cmd screen that launches generic worker.

the output shown on the screen is from this command:
https://github.com/mozilla-releng/OpenCloudConfig/blob/aaee6a1b/userdata/Configuration/GenericWorker/run-generic-worker-format-and-reboot.bat#L3

i'm confused as to why the script which launches generic worker is visible while generic worker is running. it seems like this is an intermittent issue rather than a persistent one. i'm guessing that its not the same instance of the script as the one actually running gw (eg: it's a subsequent run). perhaps we need some logic in this script that allows only a single running instance?

any ideas?
Flags: needinfo?(pmoore)
No ideas - although this should go away with bug 1399401.
Depends on: 1399401
Flags: needinfo?(pmoore)
The reason I say this, is that generic-worker 10 runs as a windows service, rather than a login script triggered by a scheduled task.

I'm guessing this worker type runs as current user too, I think not running as current user would also solve this, but I'm guessing there are reasons we can't do this.

There may be other ways to fix this rather than the solution in bug 1399401 but I think getting that bug out is the best way forward as it simplifies (and standardises) our setup in so many ways. However, if that bug takes too long to roll out, there may be other approaches.

I'm at a rust training this week in Paris, so won't get a chance to look at this properly until at least next week.

the "How do you want to open this file?" dialog has resurfaced on windows 10 arm64 instances. with a little debugging (selecting notepad as the file association), i see that the file in question contains the following text:

#!/bin/sh
# Copyright (C) 2002, Earnie Boyd
#   mailto:earnie@users.sf.net
# This file is part of Minimal SYStem.
#   http://www.mingw.org/msys.shtml
# File: cmd  "$COMSPEC" "$@"

i'm guessing that something in the mozilla-build install (the installer contains msys) is triggering this.

i will do some more debugging and try to find a way to squash this.

Blocks: 1520432
Status: RESOLVED → REOPENED
Component: RelOps: General → RelOps: Windows OS
Resolution: FIXED → ---

after a little more debugging, i determined that:

  • the file-path being opened during the mozilla-build/msys install is C:\mozilla-build\msys\bin\cmd which contains:

    #!/bin/sh
    # Copyright (C) 2002, Earnie Boyd
    #   mailto:earnie@users.sf.net
    # This file is part of Minimal SYStem.
    #   http://www.mingw.org/msys.shtml
    # File: cmd
    "$COMSPEC" "$@"
    
  • crucially, the system PATH environment variable incorrectly contained an entry for C:\mozilla-build\msys\bin before the entry for C:\Windows\System32

  • the default value setting of the system COMSPEC environment variable is CMD

these settings create an undesired or unexpected condition when the command cmd is run without a .exe suffix since the incorrect PATH setting causes cmd to be invoked at C:\mozilla-build\msys\bin\cmd instead of C:\Windows\System32\cmd.exe.

the invocation of cmd by some/any process is then not seen as an executable by the operating system since windows does not treat a file as an executable if it does not have an extension that matches one contained in the PATHEXT environment variable. instead, windows treats invocations of non-executables as a request to open a file with a name matching the invocation.

hence on our infra, whenever something invokes cmd on a system where the PATH contains a C:\mozilla-build\msys\bin entry before its C:\Windows\System32 entry, windows assumes we want to open the text file at C:\mozilla-build\msys\bin\cmd instead of invoking C:\Windows\System32\cmd.exe which is normally what the caller expects to happen.

the fix is to simply correct the PATH environment variable so that the C:\Windows\System32 entry precedes the C:\mozilla-build\msys\bin entry.

in occ, we use a dsc component to set the PATH environment variable. until now, this was achieved by prepending any entries in the manifest component to whatever was already in the system PATH. eg:

    {
      "ComponentName": "env_PATH",
      "ComponentType": "EnvironmentVariableUniquePrepend",
      "Name": "PATH",
      "Values": [
        ...,
        "C:\\mozilla-build\\msys\\bin",
        ...
      ],
      "Target": "Machine"
    },

would result in a PATH var containing: C:\mozilla-build\msys\bin;whatever-was-already-in-PATH

swapping EnvironmentVariableUniquePrepend for EnvironmentVariableUniqueAppend would have the desired effect on any new system and result in a PATH var containing: whatever-was-already-in-PATH;C:\mozilla-build\msys\bin.

however, i wanted a fix that would also correct any running systems that already have C:\mozilla-build\msys\bin in PATH in the wrong order (preceding C:\Windows\System32). so i modified the implementations of EnvironmentVariableUniquePrepend and EnvironmentVariableUniqueAppend so that they first remove any entries from PATH that also exist in the manifest component, before either appending or prepending all the manifest components to the PATH, resulting in the desired order of PATH components.

the fix patch is very simple and is implemented in the aarch64 branch here: https://github.com/mozilla-releng/OpenCloudConfig/commit/27e555c. it will be merged to master in due course

Status: REOPENED → RESOLVED
Closed: 7 years ago5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.