Talos testing gives permission denied error on windows when a timeout occurs



7 years ago
6 years ago


(Reporter: BYK, Unassigned)


Firefox Tracking Flags

(Not tracked)



(2 attachments)



7 years ago
Created attachment 571401 [details]
Test results / browser output captured

At the end of testing tp5 with "responsiveness: True" the firefox process becomes a zombie without any UI with a RAM consumption around 120MB. The Python script waits for the process to terminate for several minutes and after that it throws the following:

Traceback (most recent call last):
  File "bcontroller.py", line 235, in <module>
  File "bcontroller.py", line 232, in main
  File "bcontroller.py", line 175, in run
    results_file = open(self.browser_log, "a")
IOError: [Errno 13] Permission denied: 'browser_output.txt'

Also after killing the zombie process manually, testing starts over from cycle 1 without any firefox process being actually launched and terminates with the following:

Failed tp5:
                Stopped Wed, 02 Nov 2011 20:46:09
FAIL: Busted: tp5
FAIL: unrecognized output format
Completed test tp5:
                Stopped Wed, 02 Nov 2011 20:46:09
RETURN: cycle time: 01:04:30<br>
                Stopped Wed, 02 Nov 2011 20:46:09

A stacktrace and browser_output.txt are attached.

Comment 1

7 years ago
Created attachment 571402 [details]
Stack trace

Comment 2

7 years ago
I think the normal code has an open handle with exclusive write permission to the "browser_output.txt" file and when the timeout occurs the handle stays open. Thus, when the timeout branch tries to open the file to log the info, it gets an error.

May be a big try-finally block will solve the problem but it may come with a performance hit.

Comment 3

7 years ago
The process and the test terminates successfully with "responsiveness: False".

Also this may not be a Windows specific issue since the handle should remain open for all OS in case of a timeout.
I believe this is a windows only issue, but a flaw in the harness for when we timeout.  

Would be good to know if the stack trace helps figure out why we ended up timing out.

Comment 5

7 years ago
BYK, this is a really awesome catch.  We've been trying to hunt down a set of failures that look a lot like this but they've been hard for us to reproduce in our automation.  Thanks so much for hunting this down!

Confirming bug.
Ever confirmed: true

Comment 6

7 years ago
The stack trace here seems to be useless due to my inability to generate a proper one. Though noticed that the "hanging problem" does not occur with the nightlies.

Comment 7

7 years ago
Closing for now since it seems to work perfectly OK on nightlies.
Last Resolved: 7 years ago
Resolution: --- → WORKSFORME
I think we should fix this bug to try/catch around the file access in the timeout handler.  Even though this doesn't reproduce on the nightly builds, we can easily reproduce this on aurora and by fixing this bug it will help us see a timeout vs a file access error.
Resolution: WORKSFORME → ---

Comment 9

7 years ago
@John Maher,

Indeed, I missed the second part of the problem. Sorry =)

Comment 10

7 years ago
After some investigation, I discovered that the issue resonates from a line in bcontroller.py which does:
    os.system(somecommand + ' > ' + log_file)
in a separate thread. When the browser is frozen and the timeout occurs, the main thread tries to append into the same "log_file" which Windows keeps locked since the command line execution is not completed yet.

After some more investigation I found out that there isn't a good way to "kill" threads in Python 2.x. This issue might get resolved with the usage of mozprocess or a workaround for the "Windows file-locking" issue or killing off the spawned thread before appending to the log_file.

I have experimented with multiprocess module but it was no good and also does not make much sense since that newly created process will fire another process(the browser process in this case) which sounds like a bit inefficient.


7 years ago
Blocks: 572127

Comment 11

6 years ago
So this looks like a dupe of 572127 instead of a dep of to me.  Closing as such.  Please reopen if i have made a mistake
Last Resolved: 7 years ago6 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 572127
You need to log in before you can comment on or make changes to this bug.