Closed Bug 1317900 Opened 7 years ago Closed 7 years ago

Intermittent toolkit/components/extensions/test/mochitest/test_ext_unload_frame.html | Unexpected reply: reply to closing frame

Categories

(WebExtensions :: General, defect, P1)

defect

Tracking

(firefox-esr52 fixed, firefox55 wontfix, firefox56 fixed, firefox57 fixed)

RESOLVED FIXED
mozilla57
Tracking Status
firefox-esr52 --- fixed
firefox55 --- wontfix
firefox56 --- fixed
firefox57 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: robwu)

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell unknown])

Attachments

(1 file)

Component: WebExtensions: Untriaged → WebExtensions: General
Priority: -- → P3
:robwu - Do you know what is going wrong here?
Flags: needinfo?(rob)
Test failures only occur with e10s, and in the testConnect_and_remove_window test in test_ext_unload_frame.html.

The specific test opens a new window, establishes an extension message channel between the tab's content script and the extension's background page, and the tab immediately closes the window via window.close(). When the background page receives a message, it sends back a reply ("reply to closing frame").
Here I assumed that script execution in the tab would immediately stop after calling window.close(), and that therefore no events should be triggered.

This assumption has been proven wrong by these test failures and the following experiment (both e10s and non-e10s):
1. Visit about:config and set dom.allow_scripts_to_close_windows to true.
2. Start listening on a local port, e.g. via GNU netcat nc -l -p 8008
3. Visit data:text/html,<script>onclick=()=>{close();var x = new XMLHttpRequest();x.open('GET','http://localhost:8008/',false);x.send()}</script>
4. Click in the page to trigger close().
5. Look at the terminal with netcat and observe that the request was sent - so scripts continue executing after calling window.close();

A way to "fix" this bug is to disable the assertion in the test*_and_remove_window tests.


The increased frequency of this bug can indicate that handling window.close() (i.e. unloading a tab initiated by the web page) has become slower (or that IPC between content script -> browser -> content script has become faster).

Let's keep the test for now and see if the failures become more frequent?
If the frequency stays high/grows and my hypothesis is true, then maybe a deeper investigation for the observed performance regression might be needed.
Assignee: nobody → rob
Status: NEW → ASSIGNED
Flags: needinfo?(rob)
Priority: P3 → P1
this reduced in frequency a couple weeks ago, still would be nice to figure out, but not something we will push hard on for a fix in the short term.
Whiteboard: [stockwell unknown]
Comment on attachment 8900248 [details]
Bug 1317900 - address intermittent failure in test_ext_unload_frame.html

I ran the test 20 times, and the test hasn't failed in these runs. So I guess that this patch fixes the intermittent failure.

try: -b do -p linux,linux64,macosx64 -u mochitests,mochitests-o -t none --rebuild 20 --artifact --try-test-paths chrome:toolkit/components/extensions/test/mochitest 
https://treeherder.mozilla.org/#/jobs?repo=try&revision=775e8d9d5acc
Attachment #8900248 - Flags: review?(aswan)
in the future can you just run the single job that has the test case in it 20 times?  I know it is more work, but it would save a lot of resources- for example on osx, we have a limited small pool of hardware that run ALL the production and try jobs- saving 500 machine hours would have helped a lot there.

When I mention this is more work, I typically schedule the tests once, in this case you are looking at |try -b do -p all -u mochitest-e10s-1,mochitest-e10s-2,mochitest-e10s-3,mochitest-e10s-4,mochitest-e10s-5 -t none|

when the mochitests complete look at each raw log and search for the test name, if it ran, then retrigger that 20 times (I select the job and hit the 'r' key).

Given that try push, how can you tell you didn't break anything?  there are hundreds of orange failures
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #42)
> in the future can you just run the single job that has the test case in it
> 20 times?  I know it is more work, but it would save a lot of resources- for
> example on osx, we have a limited small pool of hardware that run ALL the
> production and try jobs- saving 500 machine hours would have helped a lot
> there.

I tried to minimize the resource usage, by:
- Using artifact builds (saves compilation time).
- Restricting the tests to a specific directory (AFAIK it is not possible to run only one test, please enlighten me if it is possible).

> 
> When I mention this is more work, I typically schedule the tests once, in
> this case you are looking at |try -b do -p all -u
> mochitest-e10s-1,mochitest-e10s-2,mochitest-e10s-3,mochitest-e10s-4,
> mochitest-e10s-5 -t none|

How did you know upfront that you have to choose mochitest-e10s-1, etc?


> when the mochitests complete look at each raw log and search for the test
> name, if it ran, then retrigger that 20 times (I select the job and hit the
> 'r' key).

Great tip, thanks.

 
> Given that try push, how can you tell you didn't break anything?  there are
> hundreds of orange failures

I wrote a quick snippet that finds all orange buttons, clicks on one, waits a few seconds, checks the log and then moves on to the next button. I kicked off the script and went on to other things. The next morning, I checked back and there are no failures (if there are failures, the printed list of buttons is not empty).

buttons = [];
(async function(){
 function sleep(ms) {return new Promise(r=>setTimeout(r, ms));}
 var ts=$$('.btn-orange,.btn-orange-classified'), i = 0;
 for (var button of ts) {
  console.log(`${++i}/${ts.length}`);
  button.click();
  await sleep(4000);
  var s = document.querySelector('th-autoclassify-errors').textContent;
  if (s.includes('unload')) {
   buttons.push(button);
  }
 }
})().then(()=>console.log(buttons), console.error);

When I ran that script, I was looking at somewhere in the 500 buttons (and "[404]" in the title). Now the title shows "[607]" with "4 in progress (OS X 10.10 opt tc-M-e10s (4), and OS X 10.10 debug tc-M-e10s (5 5 dt7)", so I guess that the tests were not finished at the time that I started the verification.
I'll run my snippet again, just in case.
that sounds like a useful little script to look at the failures :)

As for knowing which test to run, the test_ext_unload_frame.html is a mochitest-plain file, so browser-chrome,devtools,etc. are not needed.  There is a case that it could fall into the clipboard, or gpu job (many mochitest-plain tests do).  It is very confusing, so unless you know all the magic or even the basics of scheduling it is much simpler to schedule all.

a rule of thumb:
if a test filename matches browser_*.js, run |-u mochitest-e10s-bc| (double check that there is no subsuite in the .ini file)
* caveat, mochitest-devtools are browser_*.js files, but isolated to the devtools/ directory
if a test file matches test_*.(html|js|xhtml) run |-u mochitest-e10s-1,mochitest-e01s-2,...|

there are many different styles, patterns and exceptions- hopefully we can make the scheduling and display on treeherder more useful for checking changes that are isolated to a specific test or a well contained directory.
Comment on attachment 8900248 [details]
Bug 1317900 - address intermittent failure in test_ext_unload_frame.html

https://reviewboard.mozilla.org/r/171636/#review178416

Thanks!
Attachment #8900248 - Flags: review?(aswan) → review+
Pushed by rob@robwu.nl:
https://hg.mozilla.org/integration/autoland/rev/15cfcee45b87
address intermittent failure in test_ext_unload_frame.html r=aswan
https://hg.mozilla.org/mozilla-central/rev/15cfcee45b87
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Whiteboard: [stockwell unknown] → [stockwell unknown][checkin-needed-beta]
https://hg.mozilla.org/releases/mozilla-beta/rev/6dd7c570d480
Whiteboard: [stockwell unknown][checkin-needed-beta] → [stockwell unknown]
Flags: qe-verify-
Product: Toolkit → WebExtensions
You need to log in before you can comment on or make changes to this bug.