Closed Bug 1182017 Opened 4 years ago Closed 4 years ago

Intermittent test_desktop_all.py TestDesktopUnits.test_units | error: [Errno 10054] An existing connection was forcibly closed by the remote host (due to crash @mozilla::layers::PImageContainer::Transition)

Categories

(Core :: Graphics: Layers, defect, critical)

defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla43
Tracking Status
firefox41 --- unaffected
firefox42 + fixed
firefox43 --- fixed
firefox-esr38 --- unaffected

People

(Reporter: cbook, Assigned: roc)

References

()

Details

(Keywords: crash, intermittent-failure, regression, Whiteboard: [gfx-noted])

Crash Data

Attachments

(1 file)

https://treeherder.mozilla.org/logviewer.html#?job_id=11511546&repo=mozilla-inbound

02:49:41 ERROR - TEST-UNEXPECTED-ERROR | test_desktop_all.py TestDesktopUnits.test_units | error: [Errno 10054] An existing connection was forcibly closed by the remote host
Base of stacks from the logs:

4:45:42 INFO - 1 xul.dll!mozilla::layers::PImageContainer::Transition(mozilla::layers::PImageContainer::State,mozilla::ipc::Trigger,mozilla::layers::PImageContainer::State *) [PImageContainer.cpp:67262fc52141 : 28 + 0x14]
14:45:42 INFO - eip = 0x641de9cb esp = 0x119af940 ebp = 0x119af954
14:45:42 INFO - Found by: previous frame's frame pointer
14:45:42 INFO - 2 xul.dll!mozilla::layers::PImageContainerParent::Send__delete__(mozilla::layers::PImageContainerParent *) [PImageContainerParent.cpp:67262fc52141 : 57 + 0x10]
14:45:42 INFO - eip = 0x641de461 esp = 0x119af95c ebp = 0x119af9ac
14:45:42 INFO - Found by: call frame info
14:45:42 INFO - 3 xul.dll!RunnableFunction<void (*)(mozilla::layers::ImageContainerParent *),Tuple1<mozilla::layers::ImageContainerParent *> >::Run() [task.h:67262fc52141 : 420 + 0x4]
14:45:42 INFO - eip = 0x647bee54 esp = 0x119af9b4 ebp = 0x119af9d4
14:45:42 INFO - Found by: call frame info
14:45:42 INFO - 4 xul.dll!MessageLoop::RunTask(Task *) [message_loop.cc:67262fc52141 : 364 + 0xd]
14:45:42 INFO - eip = 0x640ffffe esp = 0x119af9bc ebp = 0x119af9d4
14:45:42 INFO - Found by: call frame info
14:45:42 INFO - 5 xul.dll!MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask const &) [message_loop.cc:67262fc52141 : 372 + 0x6]
14:45:42 INFO - eip = 0x640fb468 esp = 0x119af9dc ebp = 0x119af9e0
14:45:42 INFO - Found by: call frame info
14:45:42 INFO - 6 xul.dll!MessageLoop::DoWork() [message_loop.cc:67262fc52141 : 459 + 0x4]
14:45:42 INFO - eip = 0x640fbcb8 esp = 0x119af9e8 ebp = 0x119afa10
14:45:42 INFO - Found by: call frame info
Severity: normal → critical
Crash Signature: [@ mozilla::layers::PImageContainer::Transition(mozilla::layers::PImageContainer::State,mozilla::ipc::Trigger,mozilla::layers::PImageContainer::State *)]
Component: General → Graphics: Layers
Summary: Intermittent test_desktop_all.py TestDesktopUnits.test_units | error: [Errno 10054] An existing connection was forcibly closed by the remote host → Intermittent test_desktop_all.py TestDesktopUnits.test_units | error: [Errno 10054] An existing connection was forcibly closed by the remote host (due to crash @mozilla::layers::PImageContainer::Transition)
Could be bug 1143575, but not recorded for a day or so after that landed?
Nical and Matt, making sure you're aware of this.
Flags: needinfo?(nical.bugzilla)
Flags: needinfo?(matt.woodrow)
Whiteboard: [gfx-noted]
(In reply to Mark Banner (:standard8) from comment #43)
> Could be bug 1143575, but not recorded for a day or so after that landed?

Yes, that bug added the code in question.

It looks like we're trying to send the delete message on an actor that's already dead.
Flags: needinfo?(matt.woodrow)
Blocks: 1143575
Keywords: crash, regression
[Tracking Requested - why for this release]: Crash regression caused by bug 1143575.

Although nothing has been seen in the wild yet as this is only on nightly, I'm nervous that the Hello code is somehow triggering a crash and we shouldn't let it go out without this being investigated.

Additionally, the Hello tests run here are being run in the content process, so if there's a crash, then it could be triggered by any site.
Given this is a crash regression for new code, and its unclear if we'll hit it or not in release, I'd rather if we could get someone to look at this crash before it is released.

Roc, can you suggest someone to look at this please?
Flags: needinfo?(robert)
I think I have a fix for this.
Assignee: nobody → roc
Flags: needinfo?(robert)
Flags: needinfo?(nical.bugzilla)
Bug 1182017. Call Send__delete__ immediately rather than through an event. r=nical

Kyle assures me it's safe to call Send__delete__ with references to 'this' on
the stack.
Attachment #8653256 - Flags: review?(nical.bugzilla)
I suspect the actual bug here is that there's nothing keeping the ImageContainerParent alive and it dies (perhaps due to IPDL shutdown) before the Send__delete__ event runs.
Comment on attachment 8653256 [details]
MozReview Request: Bug 1182017. Call Send__delete__ immediately rather than through an event. r=nical

https://reviewboard.mozilla.org/r/17397/#review15489

Ship It!
Attachment #8653256 - Flags: review?(nical.bugzilla) → review+
https://hg.mozilla.org/mozilla-central/rev/c57791ddf02b
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla43
Thank you Robert for fixing this.
Please nominate this for Aurora approval when you get a chance.
Flags: needinfo?(roc)
Comment on attachment 8653256 [details]
MozReview Request: Bug 1182017. Call Send__delete__ immediately rather than through an event. r=nical

Approval Request Comment
[Feature/regressing bug #]: 1143575
[User impact if declined]: Possible unpredictable crashes closing windows
[Describe test coverage new/current, TreeHerder]: this code gets exercised a lot in tests
[Risks and why]: Relatively low risk, minor change
[String/UUID change made/needed]: none
Flags: needinfo?(roc)
Attachment #8653256 - Flags: approval-mozilla-aurora?
Comment on attachment 8653256 [details]
MozReview Request: Bug 1182017. Call Send__delete__ immediately rather than through an event. r=nical

Fix a crash, taking it for 42.
Attachment #8653256 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.