Bug 1682928 Comment 27 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

The crash situation is weird because it looks like we're experiencing a null deref on what should be a mutex reentrancy hang.  Building a local opt build I see the reentrancy deadlock hang, and building a local debug build I see the following report on the reentrancy:

```
 0:14.01 PASS What is this object?
 0:14.01 GECKO(675190) ###!!! ERROR: Potential deadlock detected:
 0:14.01 GECKO(675190) === Cyclical dependency starts at
 0:14.01 GECKO(675190) --- Mutex : EventSourceImpl::mSharedData (currently acquired)
 0:14.01 GECKO(675190)  calling context
 0:14.01 GECKO(675190)   [stack trace unavailable]
 0:14.01 GECKO(675190) === Cycle completed at
 0:14.01 GECKO(675190) --- Mutex : EventSourceImpl::mSharedData (currently acquired)
 0:14.01 GECKO(675190)  calling context
 0:14.01 GECKO(675190)   [stack trace unavailable]
 0:14.01 GECKO(675190) ###!!! Deadlock may happen NOW!
 0:14.01 GECKO(675190) [Parent 675204, Main Thread] ###!!! ASSERTION: Potential deadlock detected:
 0:14.01 GECKO(675190) Cyclical dependency starts at
 0:14.01 GECKO(675190) Mutex : EventSourceImpl::mSharedData (currently acquired)
 0:14.01 GECKO(675190) Cycle completed at
 0:14.01 GECKO(675190) Mutex : EventSourceImpl::mSharedData (currently acquired)
 0:14.01 GECKO(675190) ###!!! Deadlock may happen NOW!
 0:14.02 GECKO(675190) : 'Error', file /home/visbrero/rev_control/hg/mozilla-unified3/xpcom/threads/BlockingResourceBase.cpp:248
```

This finds us hanging in the call at https://hg.mozilla.org/integration/autoland/file/ef19a227311211433bbafe3cbcbbf4e97e883625/dom/base/EventSource.cpp#l1061 where we are already holding the lock but we don't drop the lock before calling DispatchFailConnection which calls IsClosed() at https://hg.mozilla.org/integration/autoland/file/ef19a227311211433bbafe3cbcbbf4e97e883625/dom/base/EventSource.cpp#l1340 which is really just a ReadyState check which grabs the lock.

Note that from discussion with :khuey after trying to reproduce from the self-serve API, it sounds like currently the necessary symbols won't exist (at least by default, I have an outstanding question on that).
The crash situation is weird because it looks like we're experiencing a null deref on what should be a mutex reentrancy hang.  Building a local opt build I see the reentrancy deadlock hang, and building a local debug build I see the following report on the reentrancy:

```
 0:14.01 PASS What is this object?
 0:14.01 GECKO(675190) ###!!! ERROR: Potential deadlock detected:
 0:14.01 GECKO(675190) === Cyclical dependency starts at
 0:14.01 GECKO(675190) --- Mutex : EventSourceImpl::mSharedData (currently acquired)
 0:14.01 GECKO(675190)  calling context
 0:14.01 GECKO(675190)   [stack trace unavailable]
 0:14.01 GECKO(675190) === Cycle completed at
 0:14.01 GECKO(675190) --- Mutex : EventSourceImpl::mSharedData (currently acquired)
 0:14.01 GECKO(675190)  calling context
 0:14.01 GECKO(675190)   [stack trace unavailable]
 0:14.01 GECKO(675190) ###!!! Deadlock may happen NOW!
 0:14.01 GECKO(675190) [Parent 675204, Main Thread] ###!!! ASSERTION: Potential deadlock detected:
 0:14.01 GECKO(675190) Cyclical dependency starts at
 0:14.01 GECKO(675190) Mutex : EventSourceImpl::mSharedData (currently acquired)
 0:14.01 GECKO(675190) Cycle completed at
 0:14.01 GECKO(675190) Mutex : EventSourceImpl::mSharedData (currently acquired)
 0:14.01 GECKO(675190) ###!!! Deadlock may happen NOW!
 0:14.02 GECKO(675190) : 'Error', file /home/visbrero/rev_control/hg/mozilla-unified3/xpcom/threads/BlockingResourceBase.cpp:248
```

This finds us hanging in the call at https://hg.mozilla.org/integration/autoland/file/ef19a227311211433bbafe3cbcbbf4e97e883625/dom/base/EventSource.cpp#l1061 where we are already holding the lock but we don't drop the lock before calling DispatchFailConnection which calls IsClosed() at https://hg.mozilla.org/integration/autoland/file/ef19a227311211433bbafe3cbcbbf4e97e883625/dom/base/EventSource.cpp#l1340 which is really just a ReadyState check which grabs the lock.

Note that from discussion with :khuey after trying to reproduce from the self-serve API, it sounds like currently the necessary symbols won't exist (at least by default, I have an outstanding question on that).  Edit: khuey identified [this line](https://searchfox.org/mozilla-central/rev/8d290159a6f80f5e33e2bd35c0e4b2df283a18c5/taskcluster/taskgraph/transforms/build.py#204) as the logic that makes it so release and try builds include full symbols, but that autoland does not/can't get those symbols, which means self-serve won't work on autoland backouts.
The crash situation is weird because it looks like we're experiencing a null deref on what should be a mutex reentrancy hang.  Building a local opt build I see the reentrancy deadlock hang, and building a local debug build I see the following report on the reentrancy:

```
 0:14.01 PASS What is this object?
 0:14.01 GECKO(675190) ###!!! ERROR: Potential deadlock detected:
 0:14.01 GECKO(675190) === Cyclical dependency starts at
 0:14.01 GECKO(675190) --- Mutex : EventSourceImpl::mSharedData (currently acquired)
 0:14.01 GECKO(675190)  calling context
 0:14.01 GECKO(675190)   [stack trace unavailable]
 0:14.01 GECKO(675190) === Cycle completed at
 0:14.01 GECKO(675190) --- Mutex : EventSourceImpl::mSharedData (currently acquired)
 0:14.01 GECKO(675190)  calling context
 0:14.01 GECKO(675190)   [stack trace unavailable]
 0:14.01 GECKO(675190) ###!!! Deadlock may happen NOW!
 0:14.01 GECKO(675190) [Parent 675204, Main Thread] ###!!! ASSERTION: Potential deadlock detected:
 0:14.01 GECKO(675190) Cyclical dependency starts at
 0:14.01 GECKO(675190) Mutex : EventSourceImpl::mSharedData (currently acquired)
 0:14.01 GECKO(675190) Cycle completed at
 0:14.01 GECKO(675190) Mutex : EventSourceImpl::mSharedData (currently acquired)
 0:14.01 GECKO(675190) ###!!! Deadlock may happen NOW!
 0:14.02 GECKO(675190) : 'Error', file /home/visbrero/rev_control/hg/mozilla-unified3/xpcom/threads/BlockingResourceBase.cpp:248
```

This finds us hanging in the call at https://hg.mozilla.org/integration/autoland/file/ef19a227311211433bbafe3cbcbbf4e97e883625/dom/base/EventSource.cpp#l1061 where we are already holding the lock but we don't drop the lock before calling DispatchFailConnection which calls IsClosed() at https://hg.mozilla.org/integration/autoland/file/ef19a227311211433bbafe3cbcbbf4e97e883625/dom/base/EventSource.cpp#l1340 which is really just a ReadyState check which grabs the lock.

Note that from discussion with :khuey after trying to reproduce from the self-serve API, it sounds like currently the necessary symbols won't exist (at least by default, I have an outstanding question on that).  Edit: khuey identified [this line](https://searchfox.org/mozilla-central/rev/8d290159a6f80f5e33e2bd35c0e4b2df283a18c5/taskcluster/taskgraph/transforms/build.py#204) as the logic that makes it so release and try builds include full symbols, but that autoland does not/can't get those symbols, which means self-serve won't work on autoland backouts.  I've tentatively raised the question of whether there's a cost reason to not provide the symbols on https://chat.mozilla.org/#/room/#firefox-ci:mozilla.org

Back to Bug 1682928 Comment 27