Closed Bug 1270962 Opened 8 years ago Closed 7 years ago

create a new clipboard job, or fix clipboard to work on windows 7 vm

Categories

(Testing :: Mochitest, defect)

defect
Not set
normal

Tracking

(firefox49 fixed, firefox50 fixed, firefox51 fixed)

RESOLVED FIXED
mozilla49
Tracking Status
firefox49 --- fixed
firefox50 --- fixed
firefox51 --- fixed

People

(Reporter: jmaher, Assigned: jmaher)

References

Details

Attachments

(5 files, 3 obsolete files)

we are getting more data about how reliable our windows 7 machines are in the cloud.  Unfortunately the aws instances have a lot of failures on clipboard related tests.

the cause is unknown despite a lot of hacking and investigations.  As a last resort, we will end up with a M(cl) job which is the clipboard job that runs all related clipboard tests.

for reference, here are the clipboard tests that I am aware of:
https://hg.mozilla.org/try/rev/9711e216e791bb759e9570a90734cc3e11a341f5

^ this is pretty inclusive after hundreds of runs of each job and analyzing the failures.

This has tests from:
* mochitest-chrome
* mochitest-browser-chrome
* mochitest-devtools
* mochitest-plain
* 1 from mochitest-jetpack (maybe we live with it for a bit)

luckily there are no tests from wpt, reftest, xpcshell that seem to be showing up.
one idea that Q tried was using clcl: http://www.nakka.com/soft/clcl/index_eng.html.  This was promising outside of firefox, but wasn't helpful in some initial runs via automation.  Possibly there is more to do with clcl or a similar tool.
Depends on: 1271664
Attached patch add new clipboard job (obsolete) — Splinter Review
having issues with list_builder_differences, otherwise I would post the builder diff and ask for review.
Assignee: nobody → jmaher
Status: NEW → ASSIGNED
Depends on: 1272467
and the builders:
Builders added:
+ Android 4.3 armv7 API 15+ try debug test mochitest-clipboard
+ Android 4.3 armv7 API 15+ try opt test mochitest-clipboard
+ Rev4 MacOSX Snow Leopard 10.6 try debug test mochitest-clipboard
+ Rev4 MacOSX Snow Leopard 10.6 try debug test mochitest-clipboard-e10s
+ Rev4 MacOSX Snow Leopard 10.6 try opt test mochitest-clipboard
+ Rev4 MacOSX Snow Leopard 10.6 try opt test mochitest-clipboard-e10s
+ Rev7 MacOSX Yosemite 10.10.5 try debug test mochitest-clipboard
+ Rev7 MacOSX Yosemite 10.10.5 try debug test mochitest-clipboard-e10s
+ Rev7 MacOSX Yosemite 10.10.5 try opt test mochitest-clipboard
+ Rev7 MacOSX Yosemite 10.10.5 try opt test mochitest-clipboard-e10s
+ Ubuntu VM 12.04 try debug test mochitest-clipboard
+ Ubuntu VM 12.04 try debug test mochitest-clipboard-e10s
+ Ubuntu VM 12.04 try opt test mochitest-clipboard
+ Ubuntu VM 12.04 try opt test mochitest-clipboard-e10s
+ Ubuntu VM 12.04 x64 try opt test mochitest-clipboard
+ Ubuntu VM 12.04 x64 try opt test mochitest-clipboard-e10s
+ Windows 10 64-bit try debug test mochitest-clipboard
+ Windows 10 64-bit try debug test mochitest-clipboard-e10s
+ Windows 10 64-bit try opt test mochitest-clipboard
+ Windows 10 64-bit try opt test mochitest-clipboard-e10s
+ Windows 7 32-bit try debug test mochitest-clipboard
+ Windows 7 32-bit try debug test mochitest-clipboard-e10s
+ Windows 7 32-bit try opt test mochitest-clipboard
+ Windows 7 32-bit try opt test mochitest-clipboard-e10s
+ Windows 7 VM 32-bit try debug test mochitest-clipboard
+ Windows 7 VM 32-bit try debug test mochitest-clipboard-e10s
+ Windows 7 VM 32-bit try opt test mochitest-clipboard
+ Windows 7 VM 32-bit try opt test mochitest-clipboard-e10s
+ Windows 7 VM-GFX 32-bit try debug test mochitest-clipboard
+ Windows 7 VM-GFX 32-bit try debug test mochitest-clipboard-e10s
+ Windows 7 VM-GFX 32-bit try opt test mochitest-clipboard
+ Windows 7 VM-GFX 32-bit try opt test mochitest-clipboard-e10s
+ Windows 8 64-bit try debug test mochitest-clipboard
+ Windows 8 64-bit try debug test mochitest-clipboard-e10s
+ Windows 8 64-bit try opt test mochitest-clipboard
+ Windows 8 64-bit try opt test mochitest-clipboard-e10s
+ Windows XP 32-bit try debug test mochitest-clipboard
+ Windows XP 32-bit try debug test mochitest-clipboard-e10s
+ Windows XP 32-bit try opt test mochitest-clipboard
+ Windows XP 32-bit try opt test mochitest-clipboard-e10s
Attachment #8751879 - Attachment is obsolete: true
Attachment #8752332 - Flags: review?(armenzg)
Comment on attachment 8752332 [details] [diff] [review]
add a clipboard job, to try only for now

Review of attachment 8752332 [details] [diff] [review]:
-----------------------------------------------------------------

I call this a piggyback patch! haha
Attachment #8752332 - Flags: review?(armenzg) → review+
on default:
http://hg.mozilla.org/build/buildbot-configs/rev/ffb96494328b

will get live on the next reconfig.
Depends on: 1273212
(In reply to Joel Maher (:jmaher) from comment #0)
> Unfortunately the aws instances have a lot of failures on
> clipboard related tests. the cause is unknown despite a lot of hacking and investigations.

Do you have a pointer to those investigations? Sure seems unfortunate to end up with a whole separate job running... Is this just a problem with the tests, or does it manifest with even manually using Firefox within in the VM (or is cut'n'paste broken for everything in the VM?
Flags: needinfo?(jmaher)
we investigated this for a while.  cut/paste works in all areas on the VM machines- it just fails intermittently about 25% of the time.  We see this all the time in automation.  In trying to hack around this in gecko and the tests, I was not able to make it work.  We tried using 3rd party clipboard managers (at the OS level) with no luck.  In addition we tested standalone scripts to cut/paste using the OS clipboard in rapid volume and had a 3x increase in misses on the VM vs the hardware.  Running manually with Firefox, I wasn't able to see a clipboard failure.

in bug 1223509 (started at comment 41) we have discussed some of the work on clipboard issues.

If there is a developer familiar with the nsiClipboard who can spare some cycles, that would help.  As it stands to get us migrated, we will keep moving forward here, but I would be very happy to get rid of the clipboard job.
Flags: needinfo?(jmaher)
It seems these issues are also still present on mochitest jobs on try, at least, cf. https://treeherder.mozilla.org/#/jobs?repo=try&revision=927b0c256887&selectedJob=21071860 (seen on multiple trypushes). Is there some way we can hide these from try so that we can work out what orange is actually the result of the trypush in question?
unfortunately there is not a way to make an exclusion in treeherder and hide these jobs on try :(  Since we wanted to make all windows 7 jobs show up on the same line, we don't have a new platform.

Even with clipboard stuff solved, we will still have a small list of misleading failures in mochitest and browser-chrome.

:catlee, how can we not run [Windows 7 VM] by default on try?  This goes back to my ask a couple weeks ago about running gfx only, etc. and not matching everything on [Windows 7].  I am not sure how to do that now that we have some jobs ported over to Windows 7 VM.
Flags: needinfo?(jmaher) → needinfo?(catlee)
We can't unless we switched to Buildbot bridge scheduling from in-tree: bug 1116275.

I'm pretty sure that sheriffs can hide certain builders on try. I would try to do it but I've failed many times trying to do so.
it would be nice if we had:
Windows VM 7
Windows GFX-VM 7

then we could easily determine what gets run on try.  I have had luck hiding builders on try, but i am unable to figure this one out since there is no clear platform to key the builder off of.
I think we have to run this platform by default on try, since they correspond to tier-1 jobs that are running on inbound, etc.
Flags: needinfo?(catlee)
this is the list I can think of, I would rather get a few folks reviewing this as it is not just a couple of tests, but 50+
Attachment #8756030 - Flags: review?(ryanvm)
Attachment #8756030 - Flags: review?(bgrinstead)
Attachment #8756030 - Flags: review?(armenzg)
Comment on attachment 8756030 [details] [diff] [review]
adjust manifests for tests that use the clipboard to be in a subsuite

Review of attachment 8756030 [details] [diff] [review]:
-----------------------------------------------------------------

devtools/ changes look fine to me.  The r=bgrins in the commit message has a typo.
Attachment #8756030 - Flags: review?(bgrinstead) → review+
one small update (typo in paswordmgr for subsite vs subsuite), also fixed commit message to have bgrins
Attachment #8756030 - Attachment is obsolete: true
Attachment #8756030 - Flags: review?(ryanvm)
Attachment #8756030 - Flags: review?(armenzg)
Attachment #8756277 - Flags: review?(ryanvm)
Attachment #8756277 - Flags: review?(armenzg)
Comment on attachment 8756277 [details] [diff] [review]
adjust manifests for tests that use the clipboard to be in a subsuite (1.1)

Review of attachment 8756277 [details] [diff] [review]:
-----------------------------------------------------------------

r=me under the assumption that this is green on Try
Attachment #8756277 - Flags: review?(ryanvm) → review+
Comment on attachment 8756277 [details] [diff] [review]
adjust manifests for tests that use the clipboard to be in a subsuite (1.1)

Review of attachment 8756277 [details] [diff] [review]:
-----------------------------------------------------------------

I don't think I'm a good reviewer for this.

As long as there are no typos, the list of tests is the right one, your try push works and devtools/clipboard change are approved by others I'm happy with the patch.
Attachment #8756277 - Flags: review?(armenzg) → review+
https://hg.mozilla.org/mozilla-central/rev/1012461fa7bb
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
leaving open for investigating the actual fix in firefox.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
https://hg.mozilla.org/mozilla-central/rev/a50249d48b1e
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
how did this get merged twice?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Oh, I manually cherrypicked it onto the merge I did last night, since it seemed harmless enough, and would help maybe help with people's try pushes. Then the original landing merged to m-c when Ryan did a merge this morning.
awesome, yes this would really help reduce the pain of try pushes- glad there was a good explanation for this!
I was discussing this issue with Joel as I can reproduce these clipboard failures locally on a Windows 7 running in virtualbox. Debugging shows that there is some sort of conflict or race when opening and writing to the clipboard. Opening the clipboard fails as a window 'VBoxSharedClipboardClass' already has the clipboard open. Disabling the shared clipboard in virtualbox fixes this problem (the shared clipboard maintains the same clipboard contents across the host and guest os) Joel is going to investigate whether there is something similar in these VMs. 

I tried a try server run with the patch in this bug disabled and similar debugging I used added and this shows that a similar conflict is occurring with something called 'CLCL'. This I think is a clipboard manager. I'm assuming that CLCL was installed as part of debugging this issue as comment 8 suggests?

Can we disable this and try running again?

The try run is:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=1f5fc710dd592aad0a6ba996e68da27ca7b6065d
Flags: needinfo?(jmaher)
I am working on a new base without clcl installed. Should have it tomorrow (Friday 5/27/2016)
Great stuff Neil!  We will update the bug when we know we have the new image rolled out, ideally we can then just retrigger jobs on your try push, for quicker validation.

To confirm CLCL was a 3rd party clipboard manager we tried installing when we couldn't figure out what was causing the problem- it didn't seem to help too much, so there is no danger in removing it.
Flags: needinfo?(jmaher)
Neil, we have updated the AMI images and by now all instances should be cycles out.  Two choices to verify:
1) retrigger existing jobs (remember which jobs you retriggered) on your try push from yesterday
2) push a new push
I just tried some of the existing jobs again but am getting the same result that CLCL has the clipboard open. I will try a new push as well.
So the new push also has issues with CLCL having the clipboard open:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=423ff65bed35448c8ca398f8c470c11fd004c101

Am I missing something?
There was a tagging error that delayed the instances refreshing. I verified that all the new instances running right now don't have clcl
ok, I retriggered m(1) on win7 opt.  I see the messages in the log file- lets see if the newer M(1) jobs have the same log info.
It looks like some untitled window has the clipboard open. Its process id is '8fc':

03:02:54     INFO -  OleSetClipboard Before 1 151DCD80  Existing Open 002A0156
03:02:54     INFO -  ----
03:02:54     INFO -    Text: 002A0156 [8fc 314 14f4]

You could perhaps find out what that process is. In a GUI of course you can use the task manager.

It may be however that we just need to make waitForClipboard() in SimpleTest.js wait at some step so that whatever is listening to clipboard changes doesn't interfere.
Attachment #8757935 - Flags: review?(ryanvm)
I just stumbled upon this subsuite when running media tests locally. I don't think test_peerConnection_capturedVideo.html is accessing the clipboard. If it is, I'd be very surprised. Any chance we can move it back to media?
Flags: needinfo?(jmaher)
oh good point- let me fix that- I must have done this on accident :(
Flags: needinfo?(jmaher)
Attachment #8757935 - Attachment is obsolete: true
Attachment #8757935 - Flags: review?(ryanvm)
Attachment #8758316 - Flags: review?(ryanvm)
Given the likelihood that this will rear up more as we virtualize etc I think modifying the test is the way to go.


(In reply to Neil Deakin from comment #36)
> It looks like some untitled window has the clipboard open. Its process id is
> '8fc':
> 
> 03:02:54     INFO -  OleSetClipboard Before 1 151DCD80  Existing Open
> 002A0156
> 03:02:54     INFO -  ----
> 03:02:54     INFO -    Text: 002A0156 [8fc 314 14f4]
> 
> You could perhaps find out what that process is. In a GUI of course you can
> use the task manager.
> 
> It may be however that we just need to make waitForClipboard() in
> SimpleTest.js wait at some step so that whatever is listening to clipboard
> changes doesn't interfere.
(In reply to Neil Deakin from comment #36)
> It may be however that we just need to make waitForClipboard() in
> SimpleTest.js wait at some step so that whatever is listening to clipboard
> changes doesn't interfere.

What would this look like in practice? How are these extra clipboard monitoring processes interfering?
Flags: needinfo?(enndeakin)
The issue occurs because waitForClipboard:

1. Puts some default text on the clipboard
2. Retrieves the clipboard contents in a loop until the data return matches the default text
3. Calls the aSetupFn to put the desired data on the clipboard.
4. Does the same as step 2 but waits until the desired data is matched.

When it fails, somewhere between step 2 and 3 Windows sends clipboard changed notifications from the clipboard change made in step 1. When it passes, this happens between step 1 and 2 instead. Likely, we need to wait after step 1 in some manner.
Flags: needinfo?(enndeakin)
Attachment #8758316 - Flags: review?(ryanvm) → review+
Pushed by jmaher@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/7b5c8958c61c
move additional tests to the clipboard job. r=RyanVM
Backed out for often failing browser_readerMode.js:

https://hg.mozilla.org/integration/mozilla-inbound/rev/8fe34ba9110c

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=b780d702673a1bd061f28d2cb9c00e3d3c38305b
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=29143025&repo=mozilla-inbound

15:14:58     INFO -  34 INFO Tab event received: load
15:14:58     INFO -  35 INFO TEST-PASS | browser/base/content/test/general/browser_readerMode.js | Info panel should have closed -
15:14:58     INFO -  36 INFO TEST-PASS | browser/base/content/test/general/browser_readerMode.js | about:reader loaded after clicking reader mode button -
15:14:58     INFO -  37 INFO TEST-PASS | browser/base/content/test/general/browser_readerMode.js | Element should not be null, when checking visibility -
15:14:58     INFO -  38 INFO TEST-PASS | browser/base/content/test/general/browser_readerMode.js | Reader mode button is present on about:reader -
15:14:58     INFO -  39 INFO TEST-PASS | browser/base/content/test/general/browser_readerMode.js | gURLBar value is about:reader URL -
15:14:58     INFO -  40 INFO TEST-PASS | browser/base/content/test/general/browser_readerMode.js | gURLBar is displaying original article URL -
15:14:58     INFO -  41 INFO TEST-PASS | browser/base/content/test/general/browser_readerMode.js | Clipboard has the given value -
15:14:58     INFO -  42 INFO TEST-UNEXPECTED-FAIL | browser/base/content/test/general/browser_readerMode.js | Test timed out -
Flags: needinfo?(jmaher)
As a trouble shooting step I am going to have runner disable the xen guest agent when a job starts and start it back up after the job is done. The xen guest agent is required for soft control of the instance from the aws api etc. It may however affect the clipboard.
Q, I can push to try, etc. and see if the clipboard job passes, likewise let me know if I should wait for a time window when instances have cycled out so we can validate this change.
Flags: needinfo?(jmaher)
Neil, Q and I chatted for a while last week in London and it seems as though running in a amazon VM will force us to have some type of xen agent always hooked into the clipboard.  There are no official solutions, just potential hacks to the system (which are not a good idea when dealing with xen as we don't realy control that part of the equation much- i.e. it can be updated or fall out of date unbeknownst to us).

Is it possibly to add some logic inside of gecko to help firefox work around this?  Possibly hidden behind a preference or an environment variable?
Flags: needinfo?(enndeakin)
A simple solution to just executeSoon here works locally but not on the real test machines.

I can try putting together some other possible options:

1. Don't wait for a response on Windows (or perhaps Mac) and just have waitForClipboard put data on the clipboard and return
2. Add a clipboard change notification
Flags: needinfo?(enndeakin)
I am not sure if #1 would work for when we try to retrieve information from the clipboard and it isn't there.  Maybe I don't understand the problem enough and this might be suitable!

#2 seems useful, we could redo waitforclipboard to ensure we have a change notification and timeout if we don't get it.
Comment on attachment 8772066 [details]
Bug 1270962 - move another test to clipboard suite for win7vm;

https://reviewboard.mozilla.org/r/64984/#review62166

please ensure that all platforms run the clipboard job properly with this change before landing it.
Attachment #8772066 - Flags: review?(jmaher) → review+
Looks good!

https://treeherder.mozilla.org/#/jobs?repo=try&revision=bf9d1b657cae

Going to land (but leaving this bug open)
Pushed by rwood@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b125c700ca86
move another test to clipboard suite for win7vm; r=jmaher
https://hg.mozilla.org/mozilla-central/rev/b125c700ca86
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
Thanks :tomcat! Leaving this bug open for the main clipboard issue
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attachment #8780306 [details] : Somehow the 'subsuite=clipboard' has made it's way back into the dom/media/tests/mochitest/mochitest.ini and it annoys several developers that it results in a 5min timeout if they try to run './mach mochitest dom/media/test/mochitest' locally, as mochitest starts with a that single sub-suite and then does not exit Firefox as it apparently (indicated through the different mochitest UI) believes it only runs a single test case and wait for the developer to quit Firefox or a 5min timeout.
I would prefer if jmaher would review this.
Is it OK to wait until Monday when he's back?
(In reply to Armen Zambrano [:armenzg] - Engineering productivity from comment #61)
> I would prefer if jmaher would review this.
> Is it OK to wait until Monday when he's back?

I only picked you, because I saw that jmaher is on PTO and you r+'ed jmaher's patches.
But sure just a few more days should not matter at this point.
So the problem is that jmaher is not accepting any review requests during his PTO (IMHO a not very useful setting). Let's hope he changes his status on Monday and I can then nominate him for review.
Flags: needinfo?(drno)
Comment on attachment 8780306 [details]
Bug 1270962: test_peerConnection_capturedVideo.html doesn't use clipboard.

https://reviewboard.mozilla.org/r/71026/#review69074

this looks good- I had intended to fix this but my patch to fix it was backed out as it was fiddling with about 15 tests and one of the other tests were problematic.
Attachment #8780306 - Flags: review+
Flags: needinfo?(drno)
Pushed by drno@ohlmeier.org:
https://hg.mozilla.org/integration/autoland/rev/e4f75aec63b5
test_peerConnection_capturedVideo.html doesn't use clipboard. r=jmaher
Can we file a followup bug on the failure mode mentioned in comment 60? The harness should not work that way.
Ted, I believe that is intentional if you only have a single test case and are running locally.
The intent for that workflow was when specifying a single test file on the commandline, like `mach mochitest path/to/test_foo.html`, the browser would stay open so developers could reload the test page after editing the test. That code path should not be active if a directory is being passed on the commandline.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #68)
> The intent for that workflow was when specifying a single test file on the
> commandline, like `mach mochitest path/to/test_foo.html`, the browser would
> stay open so developers could reload the test page after editing the test.
> That code path should not be active if a directory is being passed on the
> commandline.

I think it's just checking if all it has is a single mochitest-plain test, at that stage it has no idea what anybody passed in (a directory, or a single manifest file with 1 test, or...). Don't know how hard it is to pass that kind of thing around.
Well, we should file a followup so it doesn't get lost, and figure out how to fix it. :)
I filed bug 1295988 to track the issue of leaving the browser open with a directory of 1 test.
https://hg.mozilla.org/mozilla-central/rev/e4f75aec63b5
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attachment #8780306 - Flags: review?(armenzg)
Status: REOPENED → RESOLVED
Closed: 8 years ago7 years ago
Resolution: --- → FIXED
See Also: → 1385749
We're going to need a fix for this test soon, since the new platforms won't support running w7 without a hypervisor, and this test fails for both w7 and w10.
See Also: → 1392798
Blocks: 1394757
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: