Closed Bug 1394757 Opened 3 years ago Closed 3 years ago

unable to run mochitest clipboard tests on virtual machines due to additional clipboard provider

Categories

(Testing :: Mochitest, enhancement, P1)

53 Branch
enhancement

Tracking

(firefox59 fixed)

RESOLVED FIXED
mozilla59
Tracking Status
firefox59 --- fixed

People

(Reporter: jmaher, Assigned: jmaher)

References

Details

(Whiteboard: [PI:November])

Attachments

(2 files, 3 obsolete files)

when we moved mochitests from hardware to virtual machines (bug 1270962), we found that the clipboard tests were failing quite often and upon further investigation we determined that the virtual machine manager added a secondary clipboard provider/manager into windows.

This affected our tests and our solution was to run on hardware.  With our new hardware refresh and datacenter move, the new machines we will have do not have driver support for the native OS, so we will be running one machines <-> one VM- but this means the clipboard tests will be failing.

There is prior discussion in bug 1270962 between comment 27 and comment 51.

We need to revisit this problem and come up with a solution.
:rstrong, I filed this in shell integration, I am not sure if clipboard is really shell integration or some other component- if you could help me find the right place I would apprecate that.
Flags: needinfo?(robert.strong.bugs)
Yep, the Firefox shell integration doesn't do anything with the clipboard. I typically file these sort of bugs under the tests component.
Flags: needinfo?(robert.strong.bugs)
Joel ^^
Flags: needinfo?(jmaher)
thanks :rstrong.

In the past I had emailed Neil, he is on a long PTO, lets see if :ddurst has recommendations on finding the right team/people to get this in their queue.
Component: Shell Integration → Mochitest
Flags: needinfo?(jmaher) → needinfo?(ddurst)
Product: Firefox → Testing
Neil's back from long PTO (now on regular-length PTO), and back next week.

As he has NIs blocked still, I'll assign it to him so he sees it when he's back.
Assignee: nobody → enndeakin
Flags: needinfo?(ddurst)
On Windows it is fairly simple to add a listener to watch for clipboard changes. See

https://msdn.microsoft.com/en-us/library/windows/desktop/ms649033(v=vs.85).aspx 

I'd suggest adding a method to nsIClipboardService to be notified when the next clipboard has changed.
:ahal, would you be able to have time next week to look into this a bit?
Flags: needinfo?(ahalberstadt)
I'm still fairly confused about the problem being solved here, but sounds like we need to:

1. Implement https://w3c.github.io/clipboard-apis/#clipboard-event-clipboardchange
2. Modify SimpleTest.waitForClipboard to use these notifications instead of waiting on a loop

I definitely don't mind looking into this, but it's been about ~8 years since I last wrote a line of cpp, so if this is time critical it would be better to find someone experienced in this area.

That said I did a little digging. It looks like we already implement the cut/copy/paste events, just not the clipboardchange event. The former are defined here:
http://searchfox.org/mozilla-central/source/dom/events/EventNameList.h#472

To send this new event, I guess we have to register for the Windows notifications in comment 6 and create a 'clipboardchange' event anytime we receive WM_CLIPBOARDUPDATE.

Would implementing 'clipboardchange' on Windows only to start be acceptable? Or is it required on all platforms. I haven't checked to see if chromium supports it yet. I guess implementing this event isn't strictly necessary either, we could stick to a helper method.

Joel what's the timeframe here? Maybe we can do something hacky like bug 270962 comment 51 (option 1) for the time being and file a follow-up to implement notifications properly.
Flags: needinfo?(ahalberstadt)
Since it isn't possible to implement a clipboard change notification on all platforms without polling, I would use the helper method.
:ahal, a medium timeframe, but I don't think p1, maybe something to pick up in October- possibly using the clipboardchange event in the shorter time if it is easy.
Whiteboard: [PI:October]
Neil- can you specify which helper method we should use- I am happy to fix anything inside of mochitest, but if this is a browser level change, I would prefer if you could handle it.  As it stands I am not sure what the next steps are here, can you help out?
Flags: needinfo?(enndeakin)
This patch adds a clipboard change listener on Windows. Please test on whatever configuration and we'll see if the issue still remains.

The patch needs some polishing up and it still fails on one test: test_aboutmemory.xul in debug Windows.
Flags: needinfo?(enndeakin)
Thanks for the patch Neil, I pushed this to try to run tests on VM's instead of hardware:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a693b663ca79f66bfeefb67866ea05e9887d5657

windows 10 is looking great, windows7 is problematic.  Sadly windows 7 doesn't fail on the same test (or few), it seems to fail randomly which indicates we might not have fixed this there. 

Neil, do you think there might be changes needed in mochitest to take better use of your patch?
Flags: needinfo?(enndeakin)
Priority: -- → P1
Sorry for the delay here. I have been working on this but it is hard to debug when different tests fail in different ways each run.

I have expanded the patch to support multiprocess and needed to investigate some leaks.

In addition several tests do odd things, or don't wait for the waitForClipboard method to complete being finishing. Even the reference test test_sanitySimpletest.html calls waitForClipboard then checks the result without waiting for it to finish.

Mostly the remaining tests are editor related tests.
Flags: needinfo?(enndeakin)
thanks for the update Neil- do you need help hacking on test cases or looking for new failures?
Sure. Here are the tests that still fail:

dom/browser-element/mochitest/test_browserElement_oop_CopyPaste.html
dom/browser-element/mochitest/test_browserElement_inproc_CopyPaste.html
  - these are the same issue as they both share the same code. Single process failure only.

dom/base/test/test_copypaste.html
  - intermittent?

dom/tests/mochitest/general/test_bug1012662_editor.html
  - times out then crashes, multiprocess and single-process, Windows 7 and 10

editor/libeditor/tests/test_bug525389.html
  - doesn't use waitForClipboard. I suspect other tests that don't might intermittently fail.

There is a build with lots of extra logging at:
  https://treeherder.mozilla.org/#/jobs?repo=try&revision=da9813fb544cc883befd902dff9f90660de4c4ed
So test_bug1012662_editor.html is performing a cut on text that isn't deletable, and then just waiting for the polling to timeout. With a clipboard change notification there isn't any polling any more so I'll need to add something or this test will need to be changed.

test_browserElement_oop_CopyPaste.html is failing in a similar way during a cut so it might be the same issue.
I see a lot of progress on the clipboard tests- all green on windows 10 right now:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=baa128100b6a5f0becca5da5bffe1c2f2b002dd3&filter-searchStr=clipboard

Neil, do you think that is something we can land, then tackle the remaining issues on windows 7, or is this just testing some theories?
Flags: needinfo?(enndeakin)
Yes, I think it works ok on Windows 10. I can make a non-debugging patch if just running on Windows 10 is acceptable.
Flags: needinfo?(enndeakin)
we will need windows7 to be green as well, this seems to help some of the tests on windows 7 and will help us get closer to windows 10 :)
This is the patch with no debugging logging. It also excludes the changes to taskcluster/ci/test/mochitest.yml

It seems to pass on Windows 10.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=2519e4bf6e8e92366ba0fe90470f0273a9d03ee1
Attachment #8917817 - Attachment is obsolete: true
Windows 10 builds work fine without the clipboard changes as well:
  https://treeherder.mozilla.org/#/jobs?repo=try&revision=790aa21c3d2d3d1b43b8e4e1dfaddc053e8eb498

So this patch doesn't accomplish much for Windows 10, although it does fix/improve some tests.

On Windows 7, a quick glance suggests that the failures are more stable with my changed.
Comment on attachment 8924779 [details] [diff] [review]
clipboardchange-debug

Review of attachment 8924779 [details] [diff] [review]:
-----------------------------------------------------------------

some comments on the patch.  We have had win10 failures in the recent past when running the clipboard tests on a VM, so this is helping- possibly now other changes are allowing those tests to run green.  I also see fewer win7 failures with this patch.

::: editor/libeditor/tests/test_bug596001.html
@@ +43,5 @@
>      callback
>    );
>  }
>  
> +SimpleTest.requestCompleteLog();

I believe requestCompleteLog is not needed.

::: taskcluster/ci/test/mochitest.yml
@@ -183,4 @@
>      instance-size: xlarge
> -    worker-type:
> -        by-test-platform:
> -            windows7-32.*: buildbot-bridge/buildbot-bridge

we will need to leave this in place for win7

::: testing/mochitest/tests/SimpleTest/SimpleTest.js
@@ +1009,4 @@
>              return;
>          }
>  
> +        function failed()

I think we should have a more descriptive name for this function, maybe clipboardFailed()
Whiteboard: [PI:October] → [PI:November]
Neil, do you have any updates on this bug?  I would like to see what we have landed and get windows 7 working.  Do you have additional data or timelines?
Flags: needinfo?(enndeakin)
Right now, there are a few test failures:
  https://treeherder.mozilla.org/#/jobs?repo=try&revision=41e4569c222e6a425303622a40113fbab0a18c69

These three don't use waitForClipboard and would likely need to be modified. The first two fail frequently:
  dom/tests/mochitest/general/test_clipboard_events.html
  dom/base/test/test_copypaste.xul
  editor/libeditor/tests/test_bug525389.html

This one occasionally fails, but does some weird things like perform clipboard operations while verifying other clipboard operations. I suspect fixing it up may fix this test:
  toolkit/components/aboutmemory/tests/test_aboutmemory2.xul

This one I have only seen on the latest patch and affects Windows 10; it may be specific to it:
  devtools/client/inspector/shared/test/browser_styleinspector_context-menu-copy-urls.js

These three only fail in non-e10s as far as I can tell, but frequently:
  dom/browser-element/mochitest/test_browserElement_oop_CopyPaste.html
  dom/browser-element/mochitest/test_browserElement_inproc_CopyPaste.html
  dom/base/test/test_bug116083.html

So I suspect that we should be able to fix up the four tests and hopefully enable on Windows 10 and perhaps Windows7 multiprocess. I'll try fixing up those tests tomorrow.
Flags: needinfo?(enndeakin)
wow, that is getting very green!
My latest attempt shows no oranges on Windows 7 e10s after 22 runs:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=e4d2feb816cd108960507be42102d29886f6b7dd

However a build with all the debugging removed fails extensively:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=7283d462f05ff7a176fa2d2fa5145ac3a738ef31

I am going to try removing bits at a time to see what happens.
:ahal, this is a patch which I admit is hacky, but I am unable to get 'psutil' access in the mozharness scripts.  I do see many of these changes a net win for the mochitest code- open to other ideas, etc.

here is a try push:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c21c2339600e9f5661cdebc7a071dc541af27ba3
Attachment #8930483 - Flags: review?(ahalberstadt)
Comment on attachment 8930483 [details] [diff] [review]
remove Xen related process from running to avoid clipboard race conditions

Review of attachment 8930483 [details] [diff] [review]:
-----------------------------------------------------------------

::: testing/mochitest/runtests.py
@@ +331,5 @@
> +
> +    if HAVE_PSUTIL:
> +        """Kill a process tree (including grandchildren) with signal
> +        "sig" and return a (gone, still_alive) tuple.
> +        "on_terminate", if specified, is a callabck function which is

Update this comment -- there are no "sig" or "on_terminate" parameters.

@@ +1684,5 @@
>                  try:
>                      if proc.name() == pname:
>                          procd = proc.as_dict(attrs=['pid', 'ppid', 'name', 'username'])
>                          if proc.ppid() == 1 or not orphans:
> +                            self.log.info("killing %s (pid %d)" % (pname, proc.pid))

I don't mind if you reformat this message, but please do not remove information -- I want to see the ppid and username too.

@@ +1689,5 @@
>                              killPid(proc.pid, self.log)
>                          else:
>                              self.log.info("NOT killing %s (not an orphan?)" % procd)
> +                except Exception as e:
> +                    self.log.info("Error: Unable to kill process %s: %s" % (pname, str(e)))

I wouldn't use the word "Error" here since it is sometimes expected.

@@ +2555,5 @@
> +                                   'XenTools',
> +                                   'XenDPriv.exe')
> +            try:
> +                if os.path.isfile(xenpath):
> +                    os.remove(xenpath)

This seems like a bad thing to do to people who are running mochitests locally.
Comment on attachment 8930483 [details] [diff] [review]
remove Xen related process from running to avoid clipboard race conditions

I am going to push harder on a mozharness solution instead of a per harness solution.
Attachment #8930483 - Flags: review?(ahalberstadt)
ok, round two, this time inside of mozharness instead of the test harness.
Attachment #8930483 - Attachment is obsolete: true
Attachment #8930585 - Flags: review?(gbrown)
Comment on attachment 8930585 [details] [diff] [review]
remove Xen related process from running to avoid clipboard race conditions

Review of attachment 8930585 [details] [diff] [review]:
-----------------------------------------------------------------

Nothing major here, but let's go through another review cycle once you've considered these suggestions.

::: testing/mochitest/runtests.py
@@ +334,5 @@
> +        if pid == os.getpid():
> +            raise RuntimeError("Error: trying to kill ourselves, not another process")
> +        try:
> +            parent = psutil.Process(pid)
> +            children = parent.children(recursive=True)

Same concern as in desktop_unittest -- log every child terminated?

::: testing/mozharness/scripts/desktop_unittest.py
@@ +666,5 @@
> +    def _kill_proc_tree(self, pid, include_parent=True,
> +                        timeout=None, on_terminate=None):
> +        """Kill a process tree (including grandchildren) with signal.SIGTERM
> +        and return a (gone, still_alive) tuple.
> +        "on_terminate", if specified, is a callabck function which is

s/callabck/callback/

@@ +674,5 @@
> +        import psutil
> +        if pid == os.getpid():
> +            return (None, None)
> +        parent = psutil.Process(pid)
> +        children = parent.children(recursive=True)

In the general case, the descendants of 'parent' may include unexpected processes. It might be worth logging the names and/or pids of each descendant terminated.

@@ +683,5 @@
> +        gone, alive = psutil.wait_procs(children, timeout=timeout,
> +                                        callback=on_terminate)
> +        return (gone, alive)
> +
> +    def _kill_named_proc(self, pname, orphans=True):

Since you are only calling with orphans=False, you could simplify this function.

@@ +684,5 @@
> +                                        callback=on_terminate)
> +        return (gone, alive)
> +
> +    def _kill_named_proc(self, pname, orphans=True):
> +        import psutil

In report_system_info(), I encountered infrequent intermittent failures on import of psutil and put it inside of a try/except for that reason. I don't recall if Windows was affected...you might not need to take action - just fyi.

@@ +691,5 @@
> +                if proc.name() == pname:
> +                    procd = proc.as_dict(attrs=['pid', 'ppid', 'name', 'username'])
> +                    if proc.ppid() == 1 or not orphans:
> +                        self.info("killing %s" % procd)
> +                        self._kill_proc_tree(proc.pid)

I strongly suggest calling with a timeout -- give it a minute or two, but don't wait forever!

...and probably check the return value (expect alive==None; fail if unexpected).
Attachment #8930585 - Flags: review?(gbrown)
thanks for the suggestions :gbrown, I have implemented those and things are still looking good on try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=97e223033f541ac952f57f45d02be390199f9446
Attachment #8930585 - Attachment is obsolete: true
Attachment #8930848 - Flags: review?(gbrown)
Comment on attachment 8930848 [details] [diff] [review]
remove Xen related process from running to avoid clipboard race conditions

Review of attachment 8930848 [details] [diff] [review]:
-----------------------------------------------------------------

Please be sure to add a timeout to wait_procs() in runtests.py.

::: testing/mochitest/runtests.py
@@ +337,5 @@
> +            parent = psutil.Process(pid)
> +            children = parent.children(recursive=True)
> +            children.append(parent)
> +            for p in children:
> +                p.send_signal(signal.SIGTERM)

Is p.send_signal(SIGTERM) the same as p.terminate()? Is one better than the other?

http://psutil.readthedocs.io/en/latest/#psutil.wait_procs has an example using terminate/wait_procs/kill -- another approach worth considering.

@@ +338,5 @@
> +            children = parent.children(recursive=True)
> +            children.append(parent)
> +            for p in children:
> +                p.send_signal(signal.SIGTERM)
> +            gone, alive = psutil.wait_procs(children)

Oops - still calling wait_procs() without a timeout (despite the logging that follows!).

::: testing/mozharness/scripts/desktop_unittest.py
@@ +725,5 @@
> +        try:
> +            if os.path.isfile(xenpath):
> +                os.remove(xenpath)
> +        except Exception as e:
> +            self.info("Error: Failure to remove file %s: %s" % (xenpath, str(e)))

Consider using self.error here.
Attachment #8930848 - Flags: review?(gbrown) → review+
Pushed by jmaher@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/47bd7a73705f
remove XenDPriv.exe from VM to remove clipboard interference. r=ahal
Neil, should I leave this open for the work on cleaning up the test cases- let me know.
Flags: needinfo?(enndeakin)
I think we can open another bug for that.
Flags: needinfo?(enndeakin)
Assignee: enndeakin → jmaher
Status: NEW → ASSIGNED
https://hg.mozilla.org/mozilla-central/rev/47bd7a73705f
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
See Also: → 1399401
You need to log in before you can comment on or make changes to this bug.