Closed Bug 1497769 Opened 11 months ago Closed 10 months ago

[wpt-sync] Sync PR 13447 - [wptrunner] Discard corrupted message queues

Categories

(Testing :: web-platform-tests, enhancement, P4)

enhancement

Tracking

(firefox64 fixed)

RESOLVED FIXED
mozilla64
Tracking Status
firefox64 --- fixed

People

(Reporter: wptsync, Unassigned)

References

()

Details

(Whiteboard: [wptsync downstream])

Sync web-platform-tests PR 13447 into mozilla-central (this bug is closed when the sync is complete).

PR: https://github.com/web-platform-tests/wpt/pull/13447
Details from upstream follow.

Mike Pennisi <mike@mikepennisi.com> wrote:
>  [wptrunner] Discard corrupted message queues
>  
>  "TestRunner" sub-processes forward their standard output streams to the
>  "TestRunnerManager" process via a Python multiprocessing Queue. When
>  such a process produces a large amount of output (e.g. in failing
>  WebDriver specification tests), the data may be buffered in the
>  underlying operating system pipe. In this state, such a process will not
>  exit naturally:
>  
>  > Bear in mind that a process that has put items in a queue will wait
>  > before terminating until all the buffered items are fed by the
>  > "feeder" thread to the underlying pipe. [1]
>  
>  Previously, the TestRunnerManager forcibly terminated the sub-process
>  and re-used the message queue, providing it to a new sub-process and
>  waiting for new items to be inserted. However, the queue's behavior is
>  unpredictable in this state. It has been observed to block indefinitely
>  on GNU/Linux and macOS systems [2].
>  
>  To avoid this behavior, discard the queue and create a new instance for
>  use in subsequent tests.
>  
>  [1] https://docs.python.org/2/library/multiprocessing.html#all-platforms
>  [2] https://github.com/web-platform-tests/wpt/issues/13446
>  
>  ---
>  
>  To help understand the "corrupted queue" condition, I made a simplified demo
>  script:
>  
>  ```python
>  import multiprocessing
>  import Queue
>  
>  def target(queue, item_count, item_size, lock):
>      for _ in xrange(item_count):
>          queue.put('x' * 1024 * item_size)
>  
>      lock.release()
>  
>  def create_buffered_queue():
>      item_count = 1
>      item_size = 1
>  
>      while True:
>          item_count *= 2
>          item_size *= 2
>  
>          queue = multiprocessing.Queue()
>          lock = multiprocessing.Lock()
>          lock.acquire()
>          args=(queue, item_count, item_size, lock)
>          child = multiprocessing.Process(target=target, args=args)
>          child.start()
>          lock.acquire()
>          child.join(1)
>  
>          # The child process has inserted all items but did not exit. This
>          # indicates that the underlying pipe is buffered.
>          if child.is_alive():
>              return queue, item_count, item_size, child
>  
>  def trial(terminate_child):
>      queue, item_count, item_size, child = create_buffered_queue()
>  
>      print 'Queue created with %s %s-kilobyte items' % (item_count, item_size,)
>  
>      if terminate_child:
>          print 'Terminating child process with buffered items'
>          child.terminate()
>          child.join(1)
>  
>      print 'child.is_alive(): %s' % (child.is_alive(),)
>  
>      print 'Flushing queue'
>  
>      while True:
>          try:
>              queue.get(False)
>          except Queue.Empty:
>              break
>  
>      print 'Complete'
>  
>  trial(False)
>  
>  trial(True)
>  ```
>  
>  Ubuntu 16.04, Ubuntu 18.04, and macOS systems all wrote the following to
>  standard output and then hung indefinitely:
>  
>      Queue created with 8 8-kilobyte items
>      child.is_alive(): True
>      Flushing queue
>      Complete
>      Queue created with 8 8-kilobyte items
>      Terminating child process with buffered items
>      child.is_alive(): False
>      Flushing queue
>  
>  Also, nearby in-line comment may be related, but I'm having trouble
>  interpreting it:
>  
>  >     # This might leak a file handle from the queue
>  
>  Should that be updated?
Pushed by wptsync@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/693b05802d98
[wpt PR 13447] - [wptrunner] Discard corrupted message queues, a=testonly
Result changes from PR not available.
https://hg.mozilla.org/mozilla-central/rev/693b05802d98
Status: NEW → RESOLVED
Closed: 10 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla64
You need to log in before you can comment on or make changes to this bug.