Closed Bug 699173 Opened 13 years ago Closed 12 years ago

Talos testing gives permission denied error on windows when a timeout occurs

Categories

(Testing :: Talos, defect)

x86_64
Windows 7
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 572127

People

(Reporter: BYK, Unassigned)

References

Details

Attachments

(2 files)

At the end of testing tp5 with "responsiveness: True" the firefox process becomes a zombie without any UI with a RAM consumption around 120MB. The Python script waits for the process to terminate for several minutes and after that it throws the following:

Traceback (most recent call last):
  File "bcontroller.py", line 235, in <module>
    sys.exit(main())
  File "bcontroller.py", line 232, in main
    bcontroller.run()
  File "bcontroller.py", line 175, in run
    results_file = open(self.browser_log, "a")
IOError: [Errno 13] Permission denied: 'browser_output.txt'

Also after killing the zombie process manually, testing starts over from cycle 1 without any firefox process being actually launched and terminates with the following:

Failed tp5:
                Stopped Wed, 02 Nov 2011 20:46:09
FAIL: Busted: tp5
FAIL: unrecognized output format
Completed test tp5:
                Stopped Wed, 02 Nov 2011 20:46:09
RETURN: cycle time: 01:04:30<br>
qm-pxp01:
                Stopped Wed, 02 Nov 2011 20:46:09

A stacktrace and browser_output.txt are attached.
Attached file Stack trace
I think the normal code has an open handle with exclusive write permission to the "browser_output.txt" file and when the timeout occurs the handle stays open. Thus, when the timeout branch tries to open the file to log the info, it gets an error.

May be a big try-finally block will solve the problem but it may come with a performance hit.
The process and the test terminates successfully with "responsiveness: False".

Also this may not be a Windows specific issue since the handle should remain open for all OS in case of a timeout.
I believe this is a windows only issue, but a flaw in the harness for when we timeout.  

Would be good to know if the stack trace helps figure out why we ended up timing out.
BYK, this is a really awesome catch.  We've been trying to hunt down a set of failures that look a lot like this but they've been hard for us to reproduce in our automation.  Thanks so much for hunting this down!

Confirming bug.
Status: UNCONFIRMED → NEW
Ever confirmed: true
The stack trace here seems to be useless due to my inability to generate a proper one. Though noticed that the "hanging problem" does not occur with the nightlies.
Closing for now since it seems to work perfectly OK on nightlies.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → WORKSFORME
I think we should fix this bug to try/catch around the file access in the timeout handler.  Even though this doesn't reproduce on the nightly builds, we can easily reproduce this on aurora and by fixing this bug it will help us see a timeout vs a file access error.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
@John Maher,

Indeed, I missed the second part of the problem. Sorry =)
After some investigation, I discovered that the issue resonates from a line in bcontroller.py which does:
    os.system(somecommand + ' > ' + log_file)
in a separate thread. When the browser is frozen and the timeout occurs, the main thread tries to append into the same "log_file" which Windows keeps locked since the command line execution is not completed yet.

After some more investigation I found out that there isn't a good way to "kill" threads in Python 2.x. This issue might get resolved with the usage of mozprocess or a workaround for the "Windows file-locking" issue or killing off the spawned thread before appending to the log_file.

I have experimented with multiprocess module but it was no good and also does not make much sense since that newly created process will fire another process(the browser process in this case) which sounds like a bit inefficient.
Status: REOPENED → NEW
Blocks: 572127
So this looks like a dupe of 572127 instead of a dep of to me.  Closing as such.  Please reopen if i have made a mistake
Status: NEW → RESOLVED
Closed: 13 years ago12 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: