Closed Bug 1192317 Opened 10 years ago Closed 8 years ago

Intermittent Linux T-e10s(g1) command timed out: 3600 seconds without output running ['/tools/buildbot/bin/python', '-u', 'scripts/scripts/talos_script.py', '--suite', 'g1-e10s', '--add-option', '--webServer,localhost', '--branch-name', 'Mozilla-Inbound-

Categories

(Testing :: Talos, defect, P5)

Unspecified
Linux
defect

Tracking

(e10s+)

RESOLVED WORKSFORME
Tracking Status
e10s + ---

People

(Reporter: RyanVM, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: intermittent-failure)

No description provided.
Intermittent e10s test failure
Priority: -- → P5
the large majority of the errors here are related to linux g4-e10s, and have been a consistent source of this bug since it was turned on June 25th. the pattern is usually: 20:54:32 INFO - PROCESS | 13341 | Cycle 1(1): loaded http://localhost:57620/tests/video/video_playback.html (next: http://localhost:57620/tests/video/video_playback.html) 20:54:33 INFO - PROCESS | 13341 | RSS: Main: 372662272 20:54:33 INFO - PROCESS | 13341 | 20:54:43 INFO - PROCESS | 13341 | [Parent 13341] WARNING: Message needs unreceived descriptors channel:7f2a97057000 message-type:65531 header()->num_fds:1 num_fds:0 fds_i:0: file /builds/slave/m-in-l64-000000000000000000000/build/src/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 482 20:54:43 INFO - PROCESS | 13341 | 20:54:43 INFO - PROCESS | 13341 | ###!!! [Child][MessageChannel] Error: (msgtype=0x92000B,name=PImageBridge::Msg_PTextureConstructor) Channel error: cannot send/recv 20:54:43 INFO - PROCESS | 13341 | 20:54:43 INFO - PROCESS | 13341 | IPDL protocol error: constructor for actor failed 20:54:43 INFO - PROCESS | 13341 | [Child 13401] ###!!! ABORT: IPDL error [PImageBridgeChild]: "constructor for actor failed". abort()ing as a result.: file /builds/slave/m-in-l64-000000000000000000000/build/src/ipc/glue/ProtocolUtils.cpp, line 480 20:54:43 INFO - PROCESS | 13341 | [Child 13401] ###!!! ABORT: IPDL error [PImageBridgeChild]: "constructor for actor failed". abort()ing as a result.: file /builds/slave/m-in-l64-000000000000000000000/build/src/ipc/glue/ProtocolUtils.cpp, line 480 20:54:43 INFO - PROCESS | 13341 | 20:54:43 INFO - PROCESS | 13341 | ###!!! [Parent][MessageChannel] Error: (msgtype=0x2C007D,name=PBrowser::Msg_Destroy) Channel error: cannot send/recv 20:54:43 INFO - PROCESS | 13341 | 20:54:47 INFO - PROCESS | 13341 | 20:54:47 INFO - PROCESS | 13341 | ###!!! [Parent][MessageChannel] Error: (msgtype=0x920002,name=PImageBridge::Msg_DidComposite) Channel error: cannot send/recv 20:54:47 INFO - PROCESS | 13341 | 20:54:47 INFO - PROCESS | 13341 | 20:54:47 INFO - PROCESS | 13341 | ###!!! [Child][MessageChannel] Error: (msgtype=0x420003,name=PCompositorBridge::Msg_DidComposite) Channel error: cannot send/recv 20:54:47 INFO - PROCESS | 13341 | 20:54:47 INFO - PROCESS | 13341 | 20:54:47 INFO - PROCESS | 13341 | ###!!! [Parent][OnMaybeDequeueOne] Error: Channel error: cannot send/recv 20:54:47 INFO - PROCESS | 13341 | 20:54:47 INFO - PROCESS | 13341 | 20:54:47 INFO - PROCESS | 13341 | ###!!! [Parent][OnMaybeDequeueOne] Error: Channel error: cannot send/recv 20:54:47 INFO - PROCESS | 13341 | 20:54:47 INFO - PROCESS | 13341 | 20:54:47 INFO - PROCESS | 13341 | ###!!! [Parent][OnMaybeDequeueOne] Error: Channel error: cannot send/recv ... until we time out. this is happening about 5 times/day, we really should look into fixing this so this test is more reliable and we don't continue to bother the sheriffs. :ethlin, as you wrote the original test, can you look into reducing or fixing this error?
Flags: needinfo?(ethlin)
(In reply to Joel Maher ( :jmaher ) from comment #123) > this is happening about 5 times/day, we really should look into fixing this > so this test is more reliable and we don't continue to bother the sheriffs. > > :ethlin, as you wrote the original test, can you look into reducing or > fixing this error? Okay, I will check this error. Keep the needinfo request for reminder.
I guess the test is waiting for some events, like enter/exit fullscreen events or video events. Sometimes the test just can't receive the event due to the wrong focus or some other problems. I'll think how to make the test more robust.
thanks Ethan, let me know if I can help at all.
I tried to remove the fullscreen test in g4. The result looks good. https://treeherder.mozilla.org/#/jobs?repo=try&revision=5d70555ba7df&selectedJob=24743337
nice, this is good to confirm what the cause is.
I changed the set focus timing in the test. The non-e10s results seem to be good, but the test with e10s always failed. I'll keep finding the problem. https://treeherder.mozilla.org/#/jobs?repo=try&revision=c3450c906ff6&selectedJob=25179226
thanks for continuing to look into this Ethan!
Summary: Intermittent Linux T-e10s(g1) command timed out: 3600 seconds without output running ['/tools/buildbot/bin/python', 'scripts/scripts/talos_script.py', '--suite', 'g1-e10s', '--add-option', '--webServer,localhost', '--branch-name', 'Mozilla-Inbound-Non-PGO → Intermittent Linux T-e10s(g1) command timed out: 3600 seconds without output running ['/tools/buildbot/bin/python', '-u', 'scripts/scripts/talos_script.py', '--suite', 'g1-e10s', '--add-option', '--webServer,localhost', '--branch-name', 'Mozilla-Inbound-
:ethlin, there is a needinfo from a long time ago, we have a consistent pattern of osx failures here- is there a chance you can look into this again?
:ethlin, can you respond to the needinfo here, this bug has been silent for the last 10 months!
(In reply to Joel Maher ( :jmaher) from comment #164) > :ethlin, can you respond to the needinfo here, this bug has been silent for > the last 10 months! Sorry, I am focusing on other project and I haven't figured out how to fix this. I suppose we could remove the fullscreen mode in the test for now since other resolutions also show consistent result. :jmaher, what do you think?
Flags: needinfo?(ethlin) → needinfo?(jmaher)
I am not familiar with the test specifics, I am supportive of removing the full screen mode if this makes the test more reliable and still provides value! If you show me what to do, I can work on this, otherwise get a patch together and I can review.
Flags: needinfo?(jmaher)
I'll have patch to remove the fullscreen mode soon.
Depends on: 1370155
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.